← Back to team overview

calibre-devs team mailing list archive

Re: oeb2lit

 

On Thu, Dec 4, 2008 at 7:50 PM, Kovid Goyal <kovid@xxxxxxxxxxxxxx> wrote:

> I guess we can scratch font embedding of the list in that case. Just
> check the margins.

And MSReader completely ignores any '@page' or 'body' 'margin' or
'padding' values.  Default page-margins for you, Mr./Ms. calibre-user!

> What about the EPUB conversion bugs you? I'm always happy to get
> feedback.

This is really specific to me, so I'm not sure how helpful it is.  Most
of what I read on my Reader is books converted from LIT files, which
already (for the most part...) contain fairly good markup and metadata.
Further, by the time you'd released calibre's EPUB support, I'd already
written a simple 'oeb2epub' I could extend to do exactly what I needed,
and which definitely decreased the immediate cost for rolling my own
vs. patching calibre.  That said, specific issues I had with any2epub:

  (1) Doesn't split HTML files at page-break points.  I don't think the
      CSS spec says one way or the other, but AdobeDE (and Firefox for
      that matter, when using the paginated PS/PDF renderer) render with
      an explicit page-break "eating" the following top 'margin', but
      when the page-break is implicitly caused by the beginning of a
      markup stream do display the file-initial 'margin'.  MSReader alas
      never eats any margins, and most LIT-files with a single markup
      stream use 'margin' to specify the initial spacing for chapter
      headings.  I do see that there is code in 'split.py' to prefer to
      split at page-breaks, but it doesn't seem to work.

  (2) Small intersection of markup pre-processing needs.  Most
      LIT-contained markup needs only a small set of modifications to
      become valid XHTML, alas, none of which html.py does.  Conversely,
      most LIT markup doesn't need most of what calibre does to it,
      which means that calibre only introduces the possibility of
      deformation.

  (3) Simple font-size conversion.  Instead of a simple relative scaling
      factor, I prefer the approach of mapping the "scale" (in the sense
      of "musical scale") of font sizes used in the source to a new
      "scale" in the output.

  (4) Lack of font-embedding.  I don't at all like the default font
      AdobeDE uses, so font-embedding is a "must have" for me.

  (4) Differing typographic aesthetics.  Even though I have no formal
      typographic training, I'm kind of obsessive about what I do know.
      The default CSS produced by calibre.ebooks.html tickles that in a
      few ways:

        (a) The one-point margin between paragraphs breaks line rhythm.

        (b) Specifying all page margins with an '@page' rule causes the
            page-numbers displayed by AdobeDE to appear over the text
            (correctable by specifying the side margins with a 'body'
            tag rule).

      I could override these, but the obsessive part of me says that
      they should be the defaults.  Not very rational, I know.

  (5) Differing code aesthetics.  Another thing I'm rather obsessive
      about -- certain things about code can just irrationally get under
      my skin and make me less inclined to want implement major changes.
      For example, >80 character lines.  Whenever I submit a patch I
      usually need to first go back and revert all the lines where all I
      did was re-format them to fit in 80 columns :-).

Actually, in writing that then reading over it, I've had a bit of an
epiphany.  I think our goals are perhaps not in complete alignment.
Your goal with calibre -- generally speaking -- seems to be to produce
acceptable-quality output from any quality of input.  What I want -- and
what I'm interested in working on -- is a tool for cleanly creating
high-quality, high-fidelity, standards-compliant output from
high-quality input.

Anyway, I'll finish up oeb2lit and get it the basics integrated.

-Marshall



Follow ups

References