← Back to team overview

calibre-devs team mailing list archive

Unified conversion tool

 

Hi Kovid, etc:

I've been playing around with some ideas for creating a fully unified
conversion/generation tool -- something that's flexible enough to do
everything the current suite does, but using a modular infrastructure
allowing re-use (and re-arrangement) of parts in different conversion
pipelines.

For experimenting with CSS flattening, I've already extended my OEBBook
and backing "container" classes to further simplify content
manipulation.  Any content transform reduces then to a callable which
accepts an OEBBook object and transforms it in-place.  The remaining
bits to sort out are the command-line and Python programmatic
interfaces.  Well, and the suite of transforms :-).  But some more solid
ideas for the former gelled in my mind while sleeping.

For the command-line, I'm thinking something like this:

  oebtool [OPTIONS] [PIPELINE] [TRANSFORMS] INFILE OUTFILE

Where oebtool is a terrible name.  The basic idea is that it converts
INFILE to OUTFILE, either automatically deriving the type of each from
their filenames or with the options -i/--input-format and
-o/--output-format.  A PIPELINE is a pre-canned set of transforms.  I'm
torn on whether or not the PIPELINE should consume any options not
understood by `oebtool' itself, manipulating them and passing them on to
the individual transforms as necessary.  Each command-line TRANSFORM is
specified with a -t/--transform option, and can accept sub-options using
a syntax I'm partially stealing from mplayer:

  oebtool ... -t TRANSFORM[:[OPTION=]VALUE[:[OPTION=]VALUE...]] ...

And the input/output format objects could accept arguments in the same
way -- they're really just a special kind of transform, afterall.

So a complete command-line could look something like:

  oebtool clean \
    -t fonts:serif="Adobe Calson Pro" \
    -t margins:left=10pt:right=10pt:top=12pt:bottom=12pt \
    book.lit -o oeb:version=1.2 book/

The programmatic interface could be pretty simple.  Each transform and
pipeline could exists as a Python module exposing a particular
interface, which probably need consist only of callable, an option
parser, and a docstring.  The option parsers can hopefully be derived
from optparse.OptionParser -- that would certainly simplify things quite
a bit.

When I next find a bit of time, I'll probably push a branch up to
launchpad to play around with all this.

Comments, suggestions, etc?

-Marshall



Follow ups