calibre-devs team mailing list archive
-
calibre-devs team
-
Mailing list archive
-
Message #00097
Re: Mobipocket book at #1772
On Tuesday 03 February 2009 18:07:48 Marshall T. Vandegrift wrote:
> Kovid etc:
>
> I just pushed a fairly large refactor of the MobiReader code to
> 'staging' to deal with the book attached to #1772. The ticket itself
> wasn't due to these bugs, but I noticed them, alas... There are three
> issues:
>
> - The <img/> tags already have @src attributes and the <a/> tags
> already have @href attributes. When MobiReader does the
> @recindex->@src and @filepos->@href replace in raw markup, whichever
> attribute is first in the text "wins." To avoid this I moved the
> Mobi->HTML transition into the markup-upshift layer.
>
That should be fine.
> - The book contains hyperlinks which point into the content of other
> hyperlinking <a/> elements. Because <a/> is not valid within <a/>,
> creating a new <a/> with @name and @id at the referenced point
> causes lxml.html to first close the original <a/> tag. The end
> result is that the hyperlink doesn't contain any text and isn't
> clickable. To avoid this I modified the logic to first try to just
> add an @id to the closest tag, albeit somewhat circuitously to avoid
> the previous issue.
>
Wouldn't it be better to only try to add @id if the parent tag is an <a> tag?
Otherwise for anchors inside large blocks of text the position of the anchor
would become significantly inaccurate.
> - It appears that if the value of the second "Mobipocket / Creator
> version" field in the MOBI header (the one at offset 0x68) is '1',
> then the <hr/> tag means "page break." I verified this by creating
> a Mobipocket book with `mobigen` and manually setting the value of
> that field. When it is '1' -- and no other value -- Mobipocket
> Reader interprets <hr/> as a page break.
>
Also should be fine.
Since we're performing major surgery on mobi.reader anyway, how hard would it
be to get it to split the output into multiple HTML files at page breaks. That
was the calibre ebook-viewer will respect page breaks n MOBI markup, and
conversion of MOBI to EPUB should be significantly sped up.
Kovid.
--
_____________________________________
Kovid Goyal MC 452-48
California Institute of Technology
1200 E California Blvd
Pasadena, CA 91125
cell : +01 626 390 8699
office: +01 626 395 6595 (449 Lauritsen)
email : kovid@xxxxxxxxxxxxxxxxxx
web : http://www.kovidgoyal.net
_____________________________________
Follow ups
References