← Back to team overview

calibre-devs team mailing list archive

Re: Conversion pipeline

 

On Wed, Apr 08, 2009 at 12:25:02PM -0400, Marshall T. Vandegrift wrote:
> On Thu, Apr 2, 2009 at 12:07 AM, Kovid Goyal <kovid@xxxxxxxxxxxxxx> wrote:
> 
> > An update on the status of the pipeline. It now works for converting
> > MOBI -> OEB and I believe John has added support for TXT input/output
> > and PDF output as well, though I haven't had to chance to test it
> > yet. The code is in lp:~kovid/calibre/pluginize and I've only really
> > tested it in Linux so far.
> 
> Oooh, TXT and PDF -- nice.
> 
> An update on the status of, er, me.  My day job has been pretty busy
> lately, and I'd been feeling a bit burned out on calibre development and
> e-book-ery in general, which is why I've been AWOL for a while.  I think
> I'm ready to start participating again, albeit at a somewhat reduced
> level for a bit, as I've got a number of other irons in the fire.
> 

Welcome back :) Just so you know I've been getting concerned PMs from a couple
of Mobilereaders about you. Feel free to let me know if you dont feel like
undertaking some task and I'll try to take over.

> > Once all the porting is done, I will start work on a regression
> > testing system.
> 
> Cool.  I'm sorry that I wasn't able to contribute that as originally
> discussed :-/.  How's it coming along?
> 

I haven't got to it as yet as I've been travelling the past week, so my activity
level is lower than it should be as well.

> > I'm planning on having the test cases (i.e. ebooks in various formats)
> > stored as an encrypted binary blob on the calibre web server. So if a
> > developer wants to run the tests, he can just ask me for the key to
> > decrypt it. The reason for doing that is so that we can have
> > commercial books as test cases as well.  I'm not a hundred percent
> > certain it's necessary, so your thoughts on this, or any other aspect
> > of the test system/conversion pipeline are welcome.
> 
> That seems reasonable to me.  Perhaps though it could be divided into 2
> parts, one which contains undistributable content and one which is
> freely distributable?  That would make it possible for casual
> contributers to run at least part of the test suite.
> 

Yeah, it will be divided into two sets.

> > @Marshall: I made a change to OEBBook to have it not choke when
> > parsing of a few HTML files fails. I'd appreciate it if you could have
> > a look at my changes, as there may be a better way to do it.
> 
> I'll take a look at it this evening.  I do know that it could use some
> changes though.  Really the whole HTML->XHTML conversion should probably
> be pulled out into it's own plugable system.

That is a weak spot (an unavoidable one) in OEBBook, so it would be good 
to have it in a separate module that can be tested independently 
of the rest of the system.

> Also, one aspect of this I keep meaning to bring up: all of my code is
> based on the premise that it's processing properly-namespaced XML.
> Hence a big part of the HTML->XHTML clean-up in OEBBook is shoving all
> HTML into the XHTML namespace.  This allows correct treatment of XHTML
> vs. SVG, eventually the OPF 'case' stuff, and theoretically stuff like
> MathML.  The down-side of this approach is that XPath 1.0 doesn't
> support default namespaces and neither does lxml/libxml2.  Which means
> that any XPath expressions entered by the user need to have all elements
> prefixed with a provided prefix set.
> 

Yeah I realized that. One (hackish) solution is to simply detect HTML tag names in
user specified XPath expressions and if they are not namespaced, to insert the
XHTML namespace by default. This will hopefully not mangle the large majority of
XPath expressions.

> > Also, I'm not a hundred percent sure I'm using your CSS Flattening
> > code correctly, in particular the algorithm for determining the
> > defaults needs a once over.
> 
> For determining the default font size?
> 
> I know that some of the CSS flattening code (badly) duplicates the
> previous/existing CSS normalization code.  It needs some love and
> probably to have more logic moved into Stylizer.  (E.g., processing
> relative font/@size attributes.)
> 

At the moment, I'm just plugging in a set of font size keys and a base font size
and letting flatcss do its thing. The keys come from the input/output profile or
the user. Do you want to give it the necessary loving or should I look at it?

> > I'll hold off on implementing MOBI output and LIT input/output in case
> > you want to do that.
> 
> I would to, if you haven't done it already?
> 

I haven't. I'm stuck with porting EPUB output at the moment, as large parts of
that codebase are having to be re-written. The control flow in the old code is
very different from the new code, so it's a non-trivial migration.

> > And if you have any comments on the way the pipeline is shaping up,
> > now's the time.
> 
> If it isn't past "the time" yet, I'll also do that this evening. :-)
> 
> -Marshall
> 

It's definitely not too late :)

Kovid.

-- 
_____________________________________

Kovid Goyal 
http://www.kovidgoyal.net
http://calibre.kovidgoyal.net
_____________________________________

Attachment: pgpginBDJNDam.pgp
Description: PGP signature


References