calibre-devs team mailing list archive
-
calibre-devs team
-
Mailing list archive
-
Message #00017
Re: oeb2lit
On Thu, Dec 4, 2008 at 7:50 PM, Kovid Goyal <kovid@xxxxxxxxxxxxxx> wrote:
> I guess we can scratch font embedding of the list in that case. Just
> check the margins.
And MSReader completely ignores any '@page' or 'body' 'margin' or
'padding' values. Default page-margins for you, Mr./Ms. calibre-user!
> What about the EPUB conversion bugs you? I'm always happy to get
> feedback.
This is really specific to me, so I'm not sure how helpful it is. Most
of what I read on my Reader is books converted from LIT files, which
already (for the most part...) contain fairly good markup and metadata.
Further, by the time you'd released calibre's EPUB support, I'd already
written a simple 'oeb2epub' I could extend to do exactly what I needed,
and which definitely decreased the immediate cost for rolling my own
vs. patching calibre. That said, specific issues I had with any2epub:
(1) Doesn't split HTML files at page-break points. I don't think the
CSS spec says one way or the other, but AdobeDE (and Firefox for
that matter, when using the paginated PS/PDF renderer) render with
an explicit page-break "eating" the following top 'margin', but
when the page-break is implicitly caused by the beginning of a
markup stream do display the file-initial 'margin'. MSReader alas
never eats any margins, and most LIT-files with a single markup
stream use 'margin' to specify the initial spacing for chapter
headings. I do see that there is code in 'split.py' to prefer to
split at page-breaks, but it doesn't seem to work.
(2) Small intersection of markup pre-processing needs. Most
LIT-contained markup needs only a small set of modifications to
become valid XHTML, alas, none of which html.py does. Conversely,
most LIT markup doesn't need most of what calibre does to it,
which means that calibre only introduces the possibility of
deformation.
(3) Simple font-size conversion. Instead of a simple relative scaling
factor, I prefer the approach of mapping the "scale" (in the sense
of "musical scale") of font sizes used in the source to a new
"scale" in the output.
(4) Lack of font-embedding. I don't at all like the default font
AdobeDE uses, so font-embedding is a "must have" for me.
(4) Differing typographic aesthetics. Even though I have no formal
typographic training, I'm kind of obsessive about what I do know.
The default CSS produced by calibre.ebooks.html tickles that in a
few ways:
(a) The one-point margin between paragraphs breaks line rhythm.
(b) Specifying all page margins with an '@page' rule causes the
page-numbers displayed by AdobeDE to appear over the text
(correctable by specifying the side margins with a 'body'
tag rule).
I could override these, but the obsessive part of me says that
they should be the defaults. Not very rational, I know.
(5) Differing code aesthetics. Another thing I'm rather obsessive
about -- certain things about code can just irrationally get under
my skin and make me less inclined to want implement major changes.
For example, >80 character lines. Whenever I submit a patch I
usually need to first go back and revert all the lines where all I
did was re-format them to fit in 80 columns :-).
Actually, in writing that then reading over it, I've had a bit of an
epiphany. I think our goals are perhaps not in complete alignment.
Your goal with calibre -- generally speaking -- seems to be to produce
acceptable-quality output from any quality of input. What I want -- and
what I'm interested in working on -- is a tool for cleanly creating
high-quality, high-fidelity, standards-compliant output from
high-quality input.
Anyway, I'll finish up oeb2lit and get it the basics integrated.
-Marshall
Follow ups
References