calibre-devs team mailing list archive
-
calibre-devs team
-
Mailing list archive
-
Message #00118
Re: Conversion pipeline
On Thu, Apr 2, 2009 at 12:07 AM, Kovid Goyal <kovid@xxxxxxxxxxxxxx> wrote:
> An update on the status of the pipeline. It now works for converting
> MOBI -> OEB and I believe John has added support for TXT input/output
> and PDF output as well, though I haven't had to chance to test it
> yet. The code is in lp:~kovid/calibre/pluginize and I've only really
> tested it in Linux so far.
Oooh, TXT and PDF -- nice.
An update on the status of, er, me. My day job has been pretty busy
lately, and I'd been feeling a bit burned out on calibre development and
e-book-ery in general, which is why I've been AWOL for a while. I think
I'm ready to start participating again, albeit at a somewhat reduced
level for a bit, as I've got a number of other irons in the fire.
> Once all the porting is done, I will start work on a regression
> testing system.
Cool. I'm sorry that I wasn't able to contribute that as originally
discussed :-/. How's it coming along?
> I'm planning on having the test cases (i.e. ebooks in various formats)
> stored as an encrypted binary blob on the calibre web server. So if a
> developer wants to run the tests, he can just ask me for the key to
> decrypt it. The reason for doing that is so that we can have
> commercial books as test cases as well. I'm not a hundred percent
> certain it's necessary, so your thoughts on this, or any other aspect
> of the test system/conversion pipeline are welcome.
That seems reasonable to me. Perhaps though it could be divided into 2
parts, one which contains undistributable content and one which is
freely distributable? That would make it possible for casual
contributers to run at least part of the test suite.
> @Marshall: I made a change to OEBBook to have it not choke when
> parsing of a few HTML files fails. I'd appreciate it if you could have
> a look at my changes, as there may be a better way to do it.
I'll take a look at it this evening. I do know that it could use some
changes though. Really the whole HTML->XHTML conversion should probably
be pulled out into it's own plugable system.
Also, one aspect of this I keep meaning to bring up: all of my code is
based on the premise that it's processing properly-namespaced XML.
Hence a big part of the HTML->XHTML clean-up in OEBBook is shoving all
HTML into the XHTML namespace. This allows correct treatment of XHTML
vs. SVG, eventually the OPF 'case' stuff, and theoretically stuff like
MathML. The down-side of this approach is that XPath 1.0 doesn't
support default namespaces and neither does lxml/libxml2. Which means
that any XPath expressions entered by the user need to have all elements
prefixed with a provided prefix set.
> Also, I'm not a hundred percent sure I'm using your CSS Flattening
> code correctly, in particular the algorithm for determining the
> defaults needs a once over.
For determining the default font size?
I know that some of the CSS flattening code (badly) duplicates the
previous/existing CSS normalization code. It needs some love and
probably to have more logic moved into Stylizer. (E.g., processing
relative font/@size attributes.)
> I'll hold off on implementing MOBI output and LIT input/output in case
> you want to do that.
I would to, if you haven't done it already?
> And if you have any comments on the way the pipeline is shaping up,
> now's the time.
If it isn't past "the time" yet, I'll also do that this evening. :-)
-Marshall
Follow ups
References