calibre-devs team mailing list archive

Thread
Date

Re: Conversion pipeline

To: calibre-devs <calibre-devs@xxxxxxxxxxxxxxxxxxx>
From: "Marshall T. Vandegrift" <llasram@xxxxxxxxx>
Date: Wed, 8 Apr 2009 12:25:02 -0400
In-reply-to: <20090402040735.GU19042@qedette>

On Thu, Apr 2, 2009 at 12:07 AM, Kovid Goyal <kovid@xxxxxxxxxxxxxx> wrote:

> An update on the status of the pipeline. It now works for converting
> MOBI -> OEB and I believe John has added support for TXT input/output
> and PDF output as well, though I haven't had to chance to test it
> yet. The code is in lp:~kovid/calibre/pluginize and I've only really
> tested it in Linux so far.

Oooh, TXT and PDF -- nice.

An update on the status of, er, me.  My day job has been pretty busy
lately, and I'd been feeling a bit burned out on calibre development and
e-book-ery in general, which is why I've been AWOL for a while.  I think
I'm ready to start participating again, albeit at a somewhat reduced
level for a bit, as I've got a number of other irons in the fire.

> Once all the porting is done, I will start work on a regression
> testing system.

Cool.  I'm sorry that I wasn't able to contribute that as originally
discussed :-/.  How's it coming along?

> I'm planning on having the test cases (i.e. ebooks in various formats)
> stored as an encrypted binary blob on the calibre web server. So if a
> developer wants to run the tests, he can just ask me for the key to
> decrypt it. The reason for doing that is so that we can have
> commercial books as test cases as well.  I'm not a hundred percent
> certain it's necessary, so your thoughts on this, or any other aspect
> of the test system/conversion pipeline are welcome.

That seems reasonable to me.  Perhaps though it could be divided into 2
parts, one which contains undistributable content and one which is
freely distributable?  That would make it possible for casual
contributers to run at least part of the test suite.

> @Marshall: I made a change to OEBBook to have it not choke when
> parsing of a few HTML files fails. I'd appreciate it if you could have
> a look at my changes, as there may be a better way to do it.

I'll take a look at it this evening.  I do know that it could use some
changes though.  Really the whole HTML->XHTML conversion should probably
be pulled out into it's own plugable system.

Also, one aspect of this I keep meaning to bring up: all of my code is
based on the premise that it's processing properly-namespaced XML.
Hence a big part of the HTML->XHTML clean-up in OEBBook is shoving all
HTML into the XHTML namespace.  This allows correct treatment of XHTML
vs. SVG, eventually the OPF 'case' stuff, and theoretically stuff like
MathML.  The down-side of this approach is that XPath 1.0 doesn't
support default namespaces and neither does lxml/libxml2.  Which means
that any XPath expressions entered by the user need to have all elements
prefixed with a provided prefix set.

> Also, I'm not a hundred percent sure I'm using your CSS Flattening
> code correctly, in particular the algorithm for determining the
> defaults needs a once over.

For determining the default font size?

I know that some of the CSS flattening code (badly) duplicates the
previous/existing CSS normalization code.  It needs some love and
probably to have more logic moved into Stylizer.  (E.g., processing
relative font/@size attributes.)

> I'll hold off on implementing MOBI output and LIT input/output in case
> you want to do that.

I would to, if you haven't done it already?

> And if you have any comments on the way the pipeline is shaping up,
> now's the time.

If it isn't past "the time" yet, I'll also do that this evening. :-)

-Marshall

Follow ups

Re: Conversion pipeline
From: Kovid Goyal, 2009-04-08

References

Conversion pipeline
From: Kovid Goyal, 2009-03-11
Re: Conversion pipeline
From: Kovid Goyal, 2009-04-02