calibre-devs team mailing list archive

Thread
Date
Modularization: The Empire Strikes Back!

To: calibre-devs <calibre-devs@xxxxxxxxxxxxxxxxxxx>
From: "Marshall T. Vandegrift" <llasram@xxxxxxxxx>
Date: Sat, 7 Feb 2009 11:17:50 -0500
Hi Kovid etc.,

I've pushed a branch up to lp:~llasram/calibre/pluginize to test out
some ideas.  I pulled from lp:~kovid/calibre/pluginize, then layered on
the following changes:

  - Refactored the OPF-parsing code out of OEBBook and into a separate
    OEBReader.

  - Made OEBReader subclassable using a subclass-redefinable internal
    Container class.

  - Merged an existing LitReader refactoring I had lying around and
    tweaked it to make it actually a LitContainer.  Added a LitReader
    which just inherits from OEBReader.

Very experimental, and definitely breaks other parts of the code.  But
end result is that running calibre.ebooks.oeb.writer.main() gives you a
limited 'any2oeb' which can pull from either LIT files or OEB
hierarchies and produce an OEB hierarchy in either OPF 1.2 or 2.0, using
exactly the same code for everything except the container access. W00t!

The Container duck-interface can be used to write an EPUBReader just as
easily -- ZipFile + some encryption.xml parsing, yah?  The interface is
drop-dead simple:

  - read(path) -- returns the str() data for file at path.  If `path` is
        None then return the OPF metadata associated with the Container.

  - exists(path) -- returns True if the path is present, False if not.

  - namelist() -- Optional.  Return a list of paths in the Container.

  - write(path, data) -- Optional.  Write `data` to `path`.

The interface for Readers is even more simple.  As I'm sure you've had
in mind the whole time, but I just realized, the way conversion chains
need to work is that the user will select the output format while
calibre auto-detects the input format.  If calibre is detecting the
input format per conversion, then it can't present any options which
vary with input format.  Which means that Readers can't accept any
options themselves, or present a default transform chain, or any of that
voodoo.  Which leaves us with a Reader interface of:

  - DEFAULT_PROFILE -- Default renderer profile or string name thereof
        for content read from this type of e-book source.

  - __call__(oeb, path) -- Populate the OEBBook object `oeb` from the
        e-book source file at the path (or stream?) in `path`.

The Writer interface is a little more complex...

  - DEFAULT_PROFILE -- Default renderer profile or string name thereof
        for content written to this type of output e-book.

  - config(cfg) -- Class method.  Add configuration options for this
        output type to the `cfg` Config object.

  - generate(opts) -- Class method.  Generate a Writer instance from the
        set of command-line options on the OptParse object `opts`.

  - __call__(oeb, path) -- Write the OEBBook in `oeb` out to `path`.

I didn't mess around with Transforms here yet, but I'm thinking they
could look very similar to the Writer interface:

  - config(cfg) -- Class method.  Add configuration options for this
        Transform to the `cfg` Config object.

  - generate(opts) -- Class method.  Generate a Transform instance from
        the set of command-line options on the OptParse object `opts`.

  - __call__(oeb, context) -- Transform the OEBBook `oeb` in-place,
        possibly using information from the source/dest renderer profile
        pair in `context`.

The reason for the generate() class methods instead of just having the
__init__() methods accept an `opts` is because there's already at least
one case where we need to create a Writer programmatically (for cover
extraction, unless you want to write an OEBBook QAbstractFileEngine).  I
can see other cases where it might also be necessary for transforms, so
I think it makes sense to separate the cases out.  I'm proposing using
__call__() because the instance objects just expose one function, so I
figured why not keep it simple instead of having Readers expose read(),
Writers write(), etc.

So anyway, check out the code and let me know what you think.

-Marshall
Follow ups

Re: Modularization: The Empire Strikes Back!
From: Kovid Goyal, 2009-02-07