← Back to team overview

calibre-devs team mailing list archive

Re: Conversion pipeline

 

An update on the status of the pipeline. It now works for converting MOBI -> OEB
and I believe John has added support for TXT input/output and PDF output as
well, though I haven't had to chance to test it yet. The code is in 
lp:~kovid/calibre/pluginize and I've only really tested it in Linux so far.

In the next few days I will be porting the splitting code that currently lives
in epub.split and the structure autodetection code to the new framework and then 
adding support for EPUB output. 

Once all the porting is done, I will start work on a regression testing system.
The basic idea, as Marshall and I discussed previously is that the test system
should just compare output files byte for byte and report any differences from a
previously recorded good output. The developer of a change can then either fix
is code or mark the new output as good. The regression test system will actually
call ebook-convert to do its stuff. Hopefully it will be intelligent enough to
ignore obviously harmless changes like the package id and publish date in EPUB
metadata. I'm planning on having the test cases (i.e. ebooks in various formats)
stored as an encrypted binary blob on the calibre web server. So if a developer
wants to run the tests, he can just ask me for the key to decrypt it. The reason
for doing that is so that we can have commercial books as test cases as well.
I'm not a hundred percent certain it's necessary, so your thoughts on this, or
any other aspect of the test system/conversion pipeline are welcome.

@Marshall: I made a change to OEBBook to have it not choke when parsing of a few
HTML files fails. I'd appreciate it if you could have a look at my changes, as
there may be a better way to do it. Also, I'm not a hundred percent sure I'm
using your CSS Flattening code correctly, in particular the algorithm for
determining the defaults needs a once over. I'll hold off on implementing
MOBI output and LIT input/output in case you want to do that. And if you have
any comments on the way the pipeline is shaping up, now's the time.

Kovid.



On Tue, Mar 10, 2009 at 11:12:06PM -0700, Kovid Goyal wrote:
> Hi all,
> 
> calibre's new conversion pipeline is gradually taking shape in pluginize.
> 
> Here's a brief outline of the code so far:
> 
>             CLI or GUI
>                 |
>                 |
>              Plumber (ebooks.conversion.plumber)
>             /    |      \
>            /     |       \
> Input plugin - OEBBook - Output plugin    
> 
> The conversion pipeline has three stages, 
> Input which accepts the input file and returns an OPF file
> 
> OEBBook which does the various format independent transforms
> 
> And Output which does the output format dependent transforms and returns the
> output file.
> 
> The Plumber class is responsible for creating the pipeline and pushing things
> from one stage to the next. See Plumber.run
> 
> @Marshall
> I want oeb.base, oeb.transforms to support parse_cache which is dictionary mapping absolute paths
> of XHTML/CSS files to their parsed representations in memory, wich are lxml root
> objects/cssutils stylesheets. If an entry is present in parse_cache, oeb.*
> should use it instead of re-parsing and if it parses something it should put it
> into parse_cache. 
> 
> Since the first step is going to be CSS flattenning, oeb.flatcss should find all
> css and create a single CSSStylesheet (merging any stylesheets in parse_cache)
> and put it into parse_cache under the key 'css'.
> 
> If you've got the time to do this, go ahead, if not let me know and I shall start
> hacking on oeb.
> 
> Also what sort of options/ui is needed for your font rescaling code?
> 
> Kovid.
> 
> -- 
> _____________________________________
> 
> Kovid Goyal 
> http://www.kovidgoyal.net
> http://calibre.kovidgoyal.net
> _____________________________________



> _______________________________________________
> Mailing list: https://launchpad.net/~calibre-devs
> Post to     : calibre-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~calibre-devs
> More help   : https://help.launchpad.net/ListHelp
> 
> 
> !DSPAM:3,49b7564b75721639137648!


-- 
_____________________________________

Kovid Goyal 
http://www.kovidgoyal.net
http://calibre.kovidgoyal.net
_____________________________________

Attachment: pgpGtNkrdIFXx.pgp
Description: PGP signature


Follow ups

References