calibre-devs team mailing list archive
-
calibre-devs team
-
Mailing list archive
-
Message #00096
Mobipocket book at #1772
Kovid etc:
I just pushed a fairly large refactor of the MobiReader code to
'staging' to deal with the book attached to #1772. The ticket itself
wasn't due to these bugs, but I noticed them, alas... There are three
issues:
- The <img/> tags already have @src attributes and the <a/> tags
already have @href attributes. When MobiReader does the
@recindex->@src and @filepos->@href replace in raw markup, whichever
attribute is first in the text "wins." To avoid this I moved the
Mobi->HTML transition into the markup-upshift layer.
- The book contains hyperlinks which point into the content of other
hyperlinking <a/> elements. Because <a/> is not valid within <a/>,
creating a new <a/> with @name and @id at the referenced point
causes lxml.html to first close the original <a/> tag. The end
result is that the hyperlink doesn't contain any text and isn't
clickable. To avoid this I modified the logic to first try to just
add an @id to the closest tag, albeit somewhat circuitously to avoid
the previous issue.
- It appears that if the value of the second "Mobipocket / Creator
version" field in the MOBI header (the one at offset 0x68) is '1',
then the <hr/> tag means "page break." I verified this by creating
a Mobipocket book with `mobigen` and manually setting the value of
that field. When it is '1' -- and no other value -- Mobipocket
Reader interprets <hr/> as a page break.
I tested the changes with all of the Mobipocket books I have, but most
of them come from Tor, so I'm not sure how good of a sample it is. You
may want to give it a run over your corpus prior to merging.
-Marshall
Follow ups