← Back to team overview

calibre-devs team mailing list archive

Changes to OEBBook

 

Hi all,

In the course of migrating the HTML/EPUB input and EPUB output code to the new
conversion pipeline, I've come across a couple of limitations in OEBBook:

1) OEBBook assumes that the OPF is at the top level of the directory structure.
This means in particular that if you write out an OEBBook using say OEBWriter
and the manifest has hrefs of the form ../../a/b/some_file, some_file could end up
being written outside the target directory. This assumption was fine when the
input to OEBBook had been previously normalized by the any2* layer. But since
that's no longer the case, we need to fix this. One possible fix is to add a method to
OEBReader that will "normalize" all hrefs in the manifest to ensure they have a
common base directory that is at least ./ However, I don't think is easy easy to
do since, in principle, files pointed to in an OPF could be anywhere, including on
separate drives in windows.

The other possible solution is to require InputPlugins to ensure that their OPF
files are at the top level. For most input plugins this is (I believe) automatically the
case, however it will require extra work for the HTML and EPUB input plugins.
I think this is the best solution.

2) OEBBook actually writes out *everything* if the input OPF does not specify a
cover (in order to render the "first page" as a cover). 
This is really inefficient. Instead to handle covers I propose the
following scheme:

If the input file has a raster cover, it is specified by the input plugin via
the type="cover" entry in the OPF that the input plugin creates. 

If the input file has a HTML cover (only EPUB files so far), the input plugin
removes it from the spine, and sets an attribute html_cover_id to the id of the
HTML file (in the manifest). Note that when I say "has a HTML cover" I mean
that it indicates via the guide that the first file is the spine is only a
cover. calibre generated EPUB files and feedbooks EPUB files for example do
this. The input plugin should also render the HTML and set the type="cover"
entry in the guide to the rendered file.

If the user specifies a cover, the raster cover in the OEBBook is set and the
HTML cover, if any, is silently discarded.

If the input file has no cover, and the user does not specify a cover,
then the conversion pipeline will generate a default cover as is currently done
for downloaded news. 

Since the input to the output plugin will always have a raster cover, the output
plugin is free to decide whether to use the HTML cover (if present) instead of
the raster cover (at the moment only the EPUB output plugin would do this).

Note to self: Move the call to trimmanifest into the OEB and EPUB output
plugins, since otherwise it would trim the html cover if present.

If any of you have comments/alternate solutions, let me know. I'm going to start
working on these two solutions, so if you want things to be different, be prompt in
replying.

Kovid.
-- 
_____________________________________

Kovid Goyal 
http://www.kovidgoyal.net
http://calibre.kovidgoyal.net
_____________________________________

Attachment: pgpNiZ0LipOBY.pgp
Description: PGP signature


Follow ups