← Back to team overview

calibre-devs team mailing list archive

Mobipocket book at #1772

 

Kovid etc:

I just pushed a fairly large refactor of the MobiReader code to
'staging' to deal with the book attached to #1772.  The ticket itself
wasn't due to these bugs, but I noticed them, alas...  There are three
issues:

  - The <img/> tags already have @src attributes and the <a/> tags
    already have @href attributes.  When MobiReader does the
    @recindex->@src and @filepos->@href replace in raw markup, whichever
    attribute is first in the text "wins."  To avoid this I moved the
    Mobi->HTML transition into the markup-upshift layer.

  - The book contains hyperlinks which point into the content of other
    hyperlinking <a/> elements.  Because <a/> is not valid within <a/>,
    creating a new <a/> with @name and @id at the referenced point
    causes lxml.html to first close the original <a/> tag.  The end
    result is that the hyperlink doesn't contain any text and isn't
    clickable.  To avoid this I modified the logic to first try to just
    add an @id to the closest tag, albeit somewhat circuitously to avoid
    the previous issue.

  - It appears that if the value of the second "Mobipocket / Creator
    version" field in the MOBI header (the one at offset 0x68) is '1',
    then the <hr/> tag means "page break."  I verified this by creating
    a Mobipocket book with `mobigen` and manually setting the value of
    that field.  When it is '1' -- and no other value -- Mobipocket
    Reader interprets <hr/> as a page break.

I tested the changes with all of the Mobipocket books I have, but most
of them come from Tor, so I'm not sure how good of a sample it is.  You
may want to give it a run over your corpus prior to merging.

-Marshall



Follow ups