calibre-devs team mailing list archive
-
calibre-devs team
-
Mailing list archive
-
Message #00000
Re: Stylizer
The basic idea is applicable, but it would need to be adapted to work within
the overall parsing framework. The way I envision this framework at the moment
is:
1) User gives us name of HTML file:
2) We recursively find all HTML files it links to and build a list of them in
either depth first of breadth first order. If there is a pre-existing OPF file
with a <spine> use that instead.
3) We run a HTMLParser based on BeautifulSoup over every element in every HTML
file.
4) For each file, strip all style information (this will include <font>,
<center>, <b>, <i>,<tt> etc tags) and store it in a per file css sheet. For
inline style attributes we will have to add unique IDs to the elements and add
id based rules to the CSS dump. For style tags that occur inside <body> there
will be a little breakage, but I can live with that.
Also break up each file into chapters and store a map of every id and <a name>
element (so links can later be redirected to the correct chapter file).
5) Once we have the stylesheets for every HTML file we rebase font sizes in
every stylesheet and apply any other overriding CSS. Then add the modified
stylesheets back into the HTML files.
6) Redirect all links to be consistent with the split into chapters.
7) Create OPF and epub bundle.
Now the important thing if that while parsing each HTML file we will have to
keep track of the current font size to be able to calculate the old base size.
I'm not sure if we need to use cssutils for that or if CSS parsing code from
html2lrf is sufficient.
Given a base font size, a stylesheet (either a string or a CSSStyleSheet
object, depending on whether we use the html2lrf code or not) your code will
be responsible for returning the modified stylesheet. It will be called in step
5 above.
I've probably forgottena bunch of things the parser will have to do, so treat
this as a very rough outline.
Another question: Do you plan to re-write OPFReader using lxml?
Incidentally, I'm using this mail to test the mailling list.
Kovid.
On Tuesday 05 August 2008 19:39:33 you wrote:
> Kovid:
>
> I wanted to toss you my CSS style-applying code. Looking at the HTML
> traversal code, I'm not sure it's exactly what you have in mind, so
> I'm just attaching it to this e-mail rather than committing it for
> now. The 'stylizer.py' module contains the Stylizer and support
> classes. the 'fontnorm.py' script uses Stylizer to determine the
> current base font-size and emit an equivalent stylesheet with a
> user-seleceted fixed base font-size.
>
> HTH,
>
> -Marshall
>
>
> !DSPAM:3,48990ee833961377412940!
--
_____________________________________
Kovid Goyal MC 452-48
California Institute of Technology
1200 E California Blvd
Pasadena, CA 91125
cell : +01 626 390 8699
office: +01 626 395 6595 (449 Lauritsen)
email : kovid@xxxxxxxxxxxxxxxxxx
web : http://www.kovidgoyal.net
_____________________________________
Attachment:
signature.asc
Description: This is a digitally signed message part.
Follow ups