← Back to team overview

calibre-devs team mailing list archive

Re: Modularization

 

Ah well, if OEBBook is an abstract representation, then I'm fine with using it. 
I had a quick look over the code in oeb.base, fixed a few bugs and made a few 
cleanups. Changes in lp:calibre, have a look to ensure I haven't killed off all 
the penguins in Antarctica.

Kovid.

On Wednesday 04 February 2009 16:58:57 Marshall T. Vandegrift wrote:
> I'm out w/ my girlfriend tonights sans full computer, so only a G1-pecked
> brief response for now. I just hope this doesn't go out as an HTML
> e-mail...
>
> I think we're on almost exactly the same wavelength on almost everything.
> Timeline looks sane to me -- only thing I'd change is dropping LRF entirely
> (I kid... kind of).
>
> For the internal representation, I actually think your arguments against
> using OEB are arguments for OEB. :-)   The OEB specs match what e-book
> vendors have compromised on as the closest common match of capabilities and
> metadata, which is what we'd be trying do with a common model anyway.
> (Which is what I'm proposing -- use the OEB/OPS data model; not keeping
> around some etree OPF to stick data in.) Moreover, of our planned output
> formats EPUB and LIT *are* OEB for all intents and purposes, and MOBI uses
> some OEB metadata directly and the current code is written in terms of OEB.
> It seems like a pretty clear win to me.
>
> For OEBBook in particular, it already handles plus more most of the
> specific coding issues you mention. Markup is parsed once then lives as
> lxml.etree XHTML through the rest of the conversion. Parsed CSS is cached
> by Stylizer (in theory so it can cache the results of its own processing of
> the parsed CSS, although it doesn't do that yet), which should handle all
> the CSSing we need. The OEBBook maintains the book manifest, not only
> mapping paths to data but also tracking MIME type, spine position,
> linearity, etc, and providing methods for removing, adding, and reordering
> items.  It needs some documentation love, but I'll try to make that a
> priority during my calibre time this weekend.
>
> -Marshall
>
> On Feb 4, 2009 6:00 PM, "Kovid Goyal" <kovid@xxxxxxxxxxxxxx> wrote:
>
> Hmm well my thesis has about half a chapter left (the introduction), so
> hopefully it should be submitted sometime next week.
>
> In the meantime some random thoughts:
>
> 0) The basic structure of the conversion chain is fine, though the devil is
> in
> the details
>
> 0.5) I want to keep the Readers/Writers as simple as possible to make it
> easy
> for third parties to add plugins for new formats in the future.
>
> 1) HTML and CSS should only be parsed once (for speed)
>
> 1.1) This will necessitate keeping parsed representations of the entire
> book in memory for the complete conversion chain. For example, since the
> MOBI reader creates a parsed representation of the HTML, the rest of the
> conversion
> pipeline should be able to use that without having to re-parse. So the
> template of the reader should be modified as follows:
>  - Accept stream or pathname
>  - Output path to opf file
>  - Also optionally output a dictionary that maps absolute path names to
> parsed
> representations of the contents of the files. (i.e. lxml root objects and
> cssutils parsed stylesheets)
>
> 2) Containers: The zipfile module (both the one in calibre and the builtin
> one
> in python) are rather buggy when it comes to replacing files in zip
> archives (that is why there is a safe_replace method in the calibre zipfile
> module). So
> I'm not sure how practical this is going to be, though if we can make it
> work,
> it will be cool.
>
> 3) OEBBook: My vote is for having an abstract representation of the book,
> not
> one that is so closely tied to OEB. Tying ourselves to an internal
> representation that is based on a specification we have no influence over
> is not
> a good idea. Another reason for this is that the transformation of book ->
> abstract layer -> book increases robustness (at some cost in fidelity).
> Given
> the general philosophy of calibre, which is to accept arbitrarily bad input
> and do the best job possible, this is a desirable trait. Also the
> abstraction
> is going to be used by ebook-viewer as well which means it will need
> support for things like bookmarks, annotations, history etc. That said, I
> haven't really looked at OEBBook in detail, so this is not set in stone.
>
> 4) Covers:
> We need a sensible way to handle covers. Covers can be of two types:
> redered and reflowable. Ideally the Readers should output a covers in one
> or both these
> types. In particular an EPUB reader should output both and remove the cover
> page from the spine.
>
> 5) Command line Interface
> ebook-meta inputfile [options]
> should both read and write metadata, using the metadata plugin system
>
> ebook-convert inputfile outputfile.ext [options]
>
> The available options will change based on the type of inputfile and the
> type
> of output file. Exactly how this is going to work for both the CLI and the
> GUI
> is one of those devilish details
>
> 6) Administrative things
>
> I'm going to be developing this in lp:~kovid/calibre/pluginize
> so any one that wants to participate should pull from that branch.
>
>
> 7) Timeline
>
> a) As a first step I will create ebook-meta. Hopefully should be done by
> middle
> of next week.
>
> b) Start creating the Readers. Hopefully we can arrive at a consensus on
> the design of the Readers by next week.
>
> c) Once the readers are created we can start work on the container + ebook
> abstraction.
>
> c.5) Migrate ebook-viewer to use the new ebook abstraction
>
> d) Transforms
>
> e) Output format: EPUB, LIT, MOBI and OEB
>
> f) Command line interface
>
> g) Output format: LRF
>
> h) GUI
>
> i) Test suite (ideally this should be developed in parallel with the rest)
>
>
> Kovid.
>
> On Wednesday 04 February 2009 13:58:20 Marshall T. Vandegrift wrote: > Hi
> Kovid etc., > > I'm prett...
>
> > _______________________________________________
> > Mailing list: https://launchpad.net/~calibre-devs
> > Post to     : calibre-devs@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~calibre-devs
> > More help   : https://help.launchpad.net/ListHelp
>
> --
> _____________________________________
>
> Kovid Goyal  MC 452-48
> California Institute of Technology
> 1200 E California Blvd
> Pasadena, CA 91125
>
> cell  : +01 626 390 8699
> office: +01 626 395 6595 (449 Lauritsen)
> email : kovid@xxxxxxxxxxxxxxxxxx
> web   : http://www.kovidgoyal.net
> _____________________________________
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~calibre-devs
> Post to     : calibre-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~calibre-devs
> More help   : https://help.launchpad.net/ListHelp
>
>
> !DSPAM:3,498a39de75721602310732!

-- 
_____________________________________

Kovid Goyal  MC 452-48
California Institute of Technology
1200 E California Blvd
Pasadena, CA 91125

cell  : +01 626 390 8699
office: +01 626 395 6595 (449 Lauritsen)
email : kovid@xxxxxxxxxxxxxxxxxx
web   : http://www.kovidgoyal.net
_____________________________________




References