← Back to team overview

calibre-devs team mailing list archive

Modularization

 

Hi Kovid etc.,

I'm pretty excited to start refactoring calibre to do conversions in a
more modular fashion. How's that thesis coming? :-) I'd like to start
discussing it though.

My basic idea is that all conversion passes from the input format, to an
internal representation of an OEB book, then to the output
format. Obviously I think the OEBBook class would be a good candidate
for the intermediate representation. It needs a bit more work to nicely
support completely programmatic generation (vs OPF de-serialization) and
add documentation and test cases, but I'm really happy with how it's
turned out so far in terms of capturing almost everything expressible in
OPF, serializing back out to fully spec-compliant OPF, and presenting a
Pythonic interface to the represented information.

For the flow of content in and out of the OEB representation I see four
basic duck-types:

  - Readers. These accept a pathname and/or stream and return an OEB. I
    think they should also provide a default source renderer profile and
    initial "cleanup" transformation chain, which probably make the most
    sense as properties of a Reader itself vs. an individual Reader
    instance.

  - Containers. Provide filesystem-like access to formats which support
    such access. This isn't a core abstraction, but simplifies Readers
    which can use them.

  - Transforms. Accept an OEB and a conversion context (source and
    destination renderer profiles) and modify them in-place.

  - Writers. Accept an OEB, a convertion context, and an output
    path/stream. Write the ouput format to the output stream.

The Readers, Transforms, and Writers should all expose any options they
accept in a stackable, user-exposable fashion.  Then all the current and
future any2*s become a list of Transforms and a Writer.  Win!

Thoughts?

-Marshall



Follow ups