← Back to team overview

calibre-devs team mailing list archive

Fwd: Conversion pipeline

 

[Forwarded to the mailing list because I can never remember not to just reply.]


---------- Forwarded message ----------
From: Marshall T. Vandegrift <llasram@xxxxxxxxx>
Date: Thu, Apr 9, 2009 at 10:04 PM
Subject: Re: [Calibre-devs] Conversion pipeline
To: Kovid Goyal <kovid@xxxxxxxxxxxxxx>


On Wed, Apr 8, 2009 at 1:03 PM, Kovid Goyal <kovid@xxxxxxxxxxxxxx> wrote:

> Welcome back :) Just so you know I've been getting concerned PMs from
> a couple of Mobilereaders about you.

So s,tonight,tomorrow night, but here I am!  And awww!  Concerned about
me, or about yet-open bugs? ;-)

> Feel free to let me know if you dont feel like undertaking some task
> and I'll try to take over.

Work continues to kick hard, so I may be doing that... :-/

>>> @Marshall: I made a change to OEBBook to have it not choke when
>>> parsing of a few HTML files fails. I'd appreciate it if you could
>>> have a look at my changes, as there may be a better way to do it.

The changes look fine to me.  Some of the link-re-writing stuff I'd
probably have done as Manifest.Item methods instead of module-level
functions, but it's not a big deal to me either way.

I am a bit opposed to the 'package' transform stuff though.  With the
previous EPUB code I could see how forcing that sort of structure
simplified generating filenames and relative references, but OEBBook and
component classes now provide methods for doing all that, and nothing
should / should need to depend on a particular file structure.

> That is a weak spot (an unavoidable one) in OEBBook, so it would be good
> to have it in a separate module that can be tested independently
> of the rest of the system.

I think a per-media-type index of parsers might not be a bad idea.  The
current code will probably hold together for a pass or two though.

> Yeah I realized that. One (hackish) solution is to simply detect HTML
> tag names in user specified XPath expressions and if they are not
> namespaced, to insert the XHTML namespace by default. This will
> hopefully not mangle the large majority of XPath expressions.

Hmm.  That could work.  OTOH, most of your example expressions use the
regex extensions etc already anyway.  Any idea what sorts of expressions
most users are actually using?

> At the moment, I'm just plugging in a set of font size keys and a base
> font size and letting flatcss do its thing. The keys come from the
> input/output profile or the user.

That should do it.  Most of the other options are format-specific hacks
it was easiest to do while already iterating over all the CSS.  With CSS
becoming a cssutils CSS DOM manipulation becomes easier and some/most of
those can probably move into separate transforms.

> Do you want to give it the necessary loving or should I look at it?

I'll go ahead and do the cleaning up of the <font/>, @color,
etc. processing.  I'll make sure I cover all the existing HTML checks.

> I haven't. I'm stuck with porting EPUB output at the moment, as large
> parts of that codebase are having to be re-written. The control flow
> in the old code is very different from the new code, so it's a
> non-trivial migration.

Ah...  Oh well.  I was hoping EPUB could just be another Reader/Writer
pair which inherits from OEBReader/OEBWriter and has an EPUBContainer it
reads from / writes to.  What's the extra complication?

> It's definitely not too late :)

I'm still getting a handle on how the overall changes are fitting
together...  Where did / will the transform-option-priority-resolution
system you described end up?  Although I hadn't gotten around to saying
anything (useful, I know!), that was one part I'd been a bit skeptical
about due to the complexity of it, at least as I understood it.

-Marshall



References