← Back to team overview

calibre-devs team mailing list archive

Re: Stylizer

 

The basic idea is applicable, but it would need to be adapted to work  within 
the overall parsing framework. The way I envision this framework at the moment 
is:

1) User gives us name of HTML file:

2) We recursively find all HTML files it links to and build a list of them in 
either depth first of breadth first order. If there is a pre-existing OPF file 
with a <spine> use that instead.

3) We run a HTMLParser based on BeautifulSoup over every element in every HTML 
file. 

4) For each file, strip all style information (this will include <font>, 
<center>, <b>, <i>,<tt> etc tags) and store it in a per file css sheet. For 
inline style attributes we will have to add unique IDs to the elements and add 
id based rules to the CSS dump. For style tags that occur inside <body> there 
will be a little breakage, but I can live with that.  
Also break up each file into chapters and store a map of every id and <a name> 
element (so links can later be redirected to the correct chapter file). 

5) Once we have the stylesheets for every HTML file we rebase font sizes in 
every stylesheet and apply any other overriding CSS. Then add the modified 
stylesheets back into the HTML files.

6) Redirect all links to be consistent with the split into chapters.

7) Create OPF and epub bundle.

Now the important thing if that while parsing each HTML file we will have to 
keep track of the current font size to be able to calculate the old base size. 
I'm not sure if we need to use cssutils for that or if CSS parsing code from 
html2lrf is sufficient. 

Given a base font size, a stylesheet (either a string or a CSSStyleSheet 
object, depending on whether we use the html2lrf code or not) your code will 
be responsible for returning the modified stylesheet. It will be called in step 
5 above.

I've probably forgottena bunch of things the parser will have to do, so treat 
this as a very rough outline.

Another question: Do you plan to re-write OPFReader using lxml? 

Incidentally, I'm using this mail to test the mailling list.

Kovid.

On Tuesday 05 August 2008 19:39:33 you wrote:
> Kovid:
>
> I wanted to toss you my CSS style-applying code.  Looking at the HTML
> traversal code, I'm not sure it's exactly what you have in mind, so
> I'm just attaching it to this e-mail rather than committing it for
> now.  The 'stylizer.py' module contains the Stylizer and support
> classes.  the 'fontnorm.py' script uses Stylizer to determine the
> current base font-size and emit an equivalent stylesheet with a
> user-seleceted fixed base font-size.
>
> HTH,
>
> -Marshall
>
>
> !DSPAM:3,48990ee833961377412940!

-- 
_____________________________________

Kovid Goyal  MC 452-48
California Institute of Technology
1200 E California Blvd
Pasadena, CA 91125

cell  : +01 626 390 8699
office: +01 626 395 6595 (449 Lauritsen)
email : kovid@xxxxxxxxxxxxxxxxxx
web   : http://www.kovidgoyal.net
_____________________________________

Attachment: signature.asc
Description: This is a digitally signed message part.


Follow ups