← Back to team overview

calibre-devs team mailing list archive

Re: Modularization

 

I'm out w/ my girlfriend tonights sans full computer, so only a G1-pecked
brief response for now. I just hope this doesn't go out as an HTML e-mail...

I think we're on almost exactly the same wavelength on almost everything.
Timeline looks sane to me -- only thing I'd change is dropping LRF entirely
(I kid... kind of).

For the internal representation, I actually think your arguments against
using OEB are arguments for OEB. :-)   The OEB specs match what e-book
vendors have compromised on as the closest common match of capabilities and
metadata, which is what we'd be trying do with a common model anyway. (Which
is what I'm proposing -- use the OEB/OPS data model; not keeping around some
etree OPF to stick data in.) Moreover, of our planned output formats EPUB
and LIT *are* OEB for all intents and purposes, and MOBI uses some OEB
metadata directly and the current code is written in terms of OEB. It seems
like a pretty clear win to me.

For OEBBook in particular, it already handles plus more most of the specific
coding issues you mention. Markup is parsed once then lives as lxml.etree
XHTML through the rest of the conversion. Parsed CSS is cached by Stylizer
(in theory so it can cache the results of its own processing of the parsed
CSS, although it doesn't do that yet), which should handle all the CSSing we
need. The OEBBook maintains the book manifest, not only mapping paths to
data but also tracking MIME type, spine position, linearity, etc, and
providing methods for removing, adding, and reordering items.  It needs some
documentation love, but I'll try to make that a priority during my calibre
time this weekend.

-Marshall

On Feb 4, 2009 6:00 PM, "Kovid Goyal" <kovid@xxxxxxxxxxxxxx> wrote:

Hmm well my thesis has about half a chapter left (the introduction), so
hopefully it should be submitted sometime next week.

In the meantime some random thoughts:

0) The basic structure of the conversion chain is fine, though the devil is
in
the details

0.5) I want to keep the Readers/Writers as simple as possible to make it
easy
for third parties to add plugins for new formats in the future.

1) HTML and CSS should only be parsed once (for speed)

1.1) This will necessitate keeping parsed representations of the entire book
in memory for the complete conversion chain. For example, since the MOBI
reader creates a parsed representation of the HTML, the rest of the
conversion
pipeline should be able to use that without having to re-parse. So the
template of the reader should be modified as follows:
 - Accept stream or pathname
 - Output path to opf file
 - Also optionally output a dictionary that maps absolute path names to
parsed
representations of the contents of the files. (i.e. lxml root objects and
cssutils parsed stylesheets)

2) Containers: The zipfile module (both the one in calibre and the builtin
one
in python) are rather buggy when it comes to replacing files in zip archives
(that is why there is a safe_replace method in the calibre zipfile module).
So
I'm not sure how practical this is going to be, though if we can make it
work,
it will be cool.

3) OEBBook: My vote is for having an abstract representation of the book,
not
one that is so closely tied to OEB. Tying ourselves to an internal
representation that is based on a specification we have no influence over is
not
a good idea. Another reason for this is that the transformation of book ->
abstract layer -> book increases robustness (at some cost in fidelity).
Given
the general philosophy of calibre, which is to accept arbitrarily bad input
and do the best job possible, this is a desirable trait. Also the
abstraction
is going to be used by ebook-viewer as well which means it will need support
for things like bookmarks, annotations, history etc. That said, I haven't
really looked at OEBBook in detail, so this is not set in stone.

4) Covers:
We need a sensible way to handle covers. Covers can be of two types: redered
and reflowable. Ideally the Readers should output a covers in one or both
these
types. In particular an EPUB reader should output both and remove the cover
page from the spine.

5) Command line Interface
ebook-meta inputfile [options]
should both read and write metadata, using the metadata plugin system

ebook-convert inputfile outputfile.ext [options]

The available options will change based on the type of inputfile and the
type
of output file. Exactly how this is going to work for both the CLI and the
GUI
is one of those devilish details

6) Administrative things

I'm going to be developing this in lp:~kovid/calibre/pluginize
so any one that wants to participate should pull from that branch.


7) Timeline

a) As a first step I will create ebook-meta. Hopefully should be done by
middle
of next week.

b) Start creating the Readers. Hopefully we can arrive at a consensus on the
design of the Readers by next week.

c) Once the readers are created we can start work on the container + ebook
abstraction.

c.5) Migrate ebook-viewer to use the new ebook abstraction

d) Transforms

e) Output format: EPUB, LIT, MOBI and OEB

f) Command line interface

g) Output format: LRF

h) GUI

i) Test suite (ideally this should be developed in parallel with the rest)


Kovid.

On Wednesday 04 February 2009 13:58:20 Marshall T. Vandegrift wrote: > Hi
Kovid etc., > > I'm prett...
> _______________________________________________
> Mailing list: https://launchpad.net/~calibre-devs
> Post to     : calibre-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~calibre-devs
> More help   : https://help.launchpad.net/ListHelp
>
> !DSPAM:3,498a0f8675726133014772!

--
_____________________________________

Kovid Goyal  MC 452-48
California Institute of Technology
1200 E California Blvd
Pasadena, CA 91125

cell  : +01 626 390 8699
office: +01 626 395 6595 (449 Lauritsen)
email : kovid@xxxxxxxxxxxxxxxxxx
web   : http://www.kovidgoyal.net
_____________________________________


_______________________________________________
Mailing list: https://launchpad.net/~calibre-devs
Post to     : calibre-devs@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~calibre-devs
More help   : https://help.launchpad.net/ListHelp

Follow ups

References