← Back to team overview

calibre-devs team mailing list archive

Fwd: Unified conversion tool

 

Ditto.

---------- Forwarded message ----------
From: Marshall T. Vandegrift <llasram@xxxxxxxxx>
Date: Tue, Dec 23, 2008 at 6:23 PM
Subject: Re: [Calibre-devs] Unified conversion tool
To: Kovid Goyal <kovid@xxxxxxxxxxxxxx>


On Tue, Dec 23, 2008 at 2:44 PM, Kovid Goyal <kovid@xxxxxxxxxxxxxx> wrote:

> Now that calibre is going to support output to other HTML based
> formats, I agree that in principle the HTML processing code would
> benefit from greater modularization.

Woohoo!

> Let's put aside the question of interfaces and talk about code
> design. Here is how I envision it working:

Ok, it looks like we have a fairly similar concept of how this could
work.  I'd just modify it as follows:

 - Layer 1: Consolidate existing metadata and structure from the source
   content, producing at least an internally consistent OEB structure.

 - Layer 2: Perform transformations.  I think the distinction between
   format-neutral and format-specific transformations is artificial and
   limits flexibility.  It creates artificial barriers for sharing of
   highly similar "format-specific" transforms, and prevents a pipeline
   from ordering any format-neutral transforms after format-specific
   ones.  For example, "ugly-printing" HTML to remove extraneous
   whitespace will be necessary for both LIT and Mobipocket, and is
   best done after the markup structure is fully processed into the
   form it will be serialized as.  Adding user-specified/-directed
   metadata can also be done as a transform.

 - Layer 3: Write to the output file, serializing the OEB content and
   encapsulating the fully-transformed OEB structure in the
   format-specific container.

> So Layer 1 is common to all tools. Layers 2 and 3 will have well
> defined interfaces behind which there will be format specific plugins.

With my proposal, format-specific conversion tools would specify the set
of transformations users can sensibly apply.  Even the most "generic"
transforms don't make sense for every format; e.g., margins for LIT.  So
they would share all of Layer 1, some of Layer 2, and have a
well-defined interface between each layer.

I also think it's important to carry a kind of "conversion context"
through the transforms at Layer 2, providing them with device profiles
for the source content and destination format.  This will allow them to
correctly interpret profile-determined measurements, and to produce
results in line with output-profile conventions.

-Marshall



Follow ups

References