← Back to team overview

calibre-devs team mailing list archive

Re: Branch lp:~llasram/calibre/oeb2lit

 

You can probably get it to upload to launchpad with a 
bzr push

Does MSReader respect the css whitespace option?  One option would just be to 
set it to "preserve" for all extracted tags and not do any pretty printing. Or 
when creating the LIT file to explicitly set ti to normal (if supported by 
MSreader).

Kovid.

Remember to preserve whitespace for pre tags as well. 

On Tuesday 09 December 2008 11:34:08 Marshall T. Vandegrift wrote:
> On Tue, Dec 9, 2008 at 1:56 PM, Kovid Goyal <kovid@xxxxxxxxxxxxxx> wrote:
> > I get a "not a branch" error when trying to check out the code. And on
> > launchpad it says "This branch has not been pushed to yet"
>
> Well, that's a bug in launchpad and/or bazaar.  I first tried to branch
> from lp:calibre to lp:~llasram/calibre/oeb2lit and it ground for a bit
> before erroring out.  Then I just branched from my local oeb2lit branch
> to lp:~llasram/calibre/oeb2lit and it reported success.  I suppose I
> should have been suspicious when I tried to merge->push the most recent
> trunk changes and it said everything was already up-to-date.
>
> > I have no problems with exposing function pointers to ctypes in
> > principle, but will that technique be portable across compilers?
>
> I don't see why it shouldn't be.  I'm giving ctypes exactly what it
> would get if it found the address via a library symbol lookup.  Even if
> this were on an architecture like Alpha or IA-64 with crazy large
> address space trampolines it should still work just fine.
>
> That said, I'm less pleased with it than I was on the bus this morning,
> so I'll probably wrap the functions with C/Python bindings afterall.
>
> > Why are you using strip_space? To prettify the HTML?
>
> I meant to look up the actual option before I sent the e-mail, the
> actual option being `remove_blank_text'.  The issue is that MSReader
> treats all whitespace in the markup stream as relevant.  So markup which
> is pretty-printed to be like this:
>
>   <div>
>     <span>Here is one span</span>
>     <span>followed by another</span>
>   </div>
>
> Comes out rendered like:
>
>     Here is one span
>     followed by another.
>
> I'm currently using `remove_blank_text' plus collapsing sequences of
> whitespace to *un*pretty-print, but I'm going to try instead removing
> whitespace-only `elem.text's, whitespace-only `elem.tail's of last
> children, and whitespace-only `elem.tail's between `display: block'
> elements.
>
> Learning this also impacts how we do LIT-extraction.  Right now
> pretty-printing LIT markup uses `remove_blank_text' to make the markup
> pretty-printable, which has the aforementioned property of deforming it
> in some cases.  I think the easiest, most general solution is to
> "protect" any whitespace-only text with a <span/> tag.  The only
> downsides are that it makes the extraction somewhat unfaithful to the
> source content, and can result in spurious extra <span/>s in books which
> e.g., have a trailing space at the end of every paragraph.
>
> -Marshall
>
> _______________________________________________
> Mailing list: https://launchpad.net/~calibre-devs
> Post to     : calibre-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~calibre-devs
> More help   : https://help.launchpad.net/ListHelp
>
> !DSPAM:3,493ec83a75721071015961!

-- 
_____________________________________

Kovid Goyal  MC 452-48
California Institute of Technology
1200 E California Blvd
Pasadena, CA 91125

cell  : +01 626 390 8699
office: +01 626 395 6595 (449 Lauritsen)
email : kovid@xxxxxxxxxxxxxxxxxx
web   : http://www.kovidgoyal.net
_____________________________________




Follow ups

References