← Back to team overview

calibre-devs team mailing list archive

Re: Mobipocket book at #1772

 

On Tuesday 03 February 2009 18:07:48 Marshall T. Vandegrift wrote:
> Kovid etc:
>
> I just pushed a fairly large refactor of the MobiReader code to
> 'staging' to deal with the book attached to #1772.  The ticket itself
> wasn't due to these bugs, but I noticed them, alas...  There are three
> issues:
>
>   - The <img/> tags already have @src attributes and the <a/> tags
>     already have @href attributes.  When MobiReader does the
>     @recindex->@src and @filepos->@href replace in raw markup, whichever
>     attribute is first in the text "wins."  To avoid this I moved the
>     Mobi->HTML transition into the markup-upshift layer.
>

That should be fine. 

>   - The book contains hyperlinks which point into the content of other
>     hyperlinking <a/> elements.  Because <a/> is not valid within <a/>,
>     creating a new <a/> with @name and @id at the referenced point
>     causes lxml.html to first close the original <a/> tag.  The end
>     result is that the hyperlink doesn't contain any text and isn't
>     clickable.  To avoid this I modified the logic to first try to just
>     add an @id to the closest tag, albeit somewhat circuitously to avoid
>     the previous issue.
>

Wouldn't it be better to only try to add @id if the parent tag is an <a> tag? 
Otherwise for anchors inside large blocks of text the position of the anchor 
would become significantly inaccurate. 

>   - It appears that if the value of the second "Mobipocket / Creator
>     version" field in the MOBI header (the one at offset 0x68) is '1',
>     then the <hr/> tag means "page break."  I verified this by creating
>     a Mobipocket book with `mobigen` and manually setting the value of
>     that field.  When it is '1' -- and no other value -- Mobipocket
>     Reader interprets <hr/> as a page break.
>

Also should be fine.

Since we're performing major surgery on mobi.reader anyway, how hard would it 
be to get it to split the output into multiple HTML files at page breaks. That 
was the calibre ebook-viewer will respect page breaks n MOBI markup, and 
conversion of MOBI to EPUB should be significantly sped up. 

Kovid.

-- 
_____________________________________

Kovid Goyal  MC 452-48
California Institute of Technology
1200 E California Blvd
Pasadena, CA 91125

cell  : +01 626 390 8699
office: +01 626 395 6595 (449 Lauritsen)
email : kovid@xxxxxxxxxxxxxxxxxx
web   : http://www.kovidgoyal.net
_____________________________________




Follow ups

References