calibre-devs team mailing list archive
-
calibre-devs team
-
Mailing list archive
-
Message #00023
Re: Branch lp:~llasram/calibre/oeb2lit
On Tue, Dec 9, 2008 at 1:56 PM, Kovid Goyal <kovid@xxxxxxxxxxxxxx> wrote:
> I get a "not a branch" error when trying to check out the code. And on
> launchpad it says "This branch has not been pushed to yet"
Well, that's a bug in launchpad and/or bazaar. I first tried to branch
from lp:calibre to lp:~llasram/calibre/oeb2lit and it ground for a bit
before erroring out. Then I just branched from my local oeb2lit branch
to lp:~llasram/calibre/oeb2lit and it reported success. I suppose I
should have been suspicious when I tried to merge->push the most recent
trunk changes and it said everything was already up-to-date.
> I have no problems with exposing function pointers to ctypes in
> principle, but will that technique be portable across compilers?
I don't see why it shouldn't be. I'm giving ctypes exactly what it
would get if it found the address via a library symbol lookup. Even if
this were on an architecture like Alpha or IA-64 with crazy large
address space trampolines it should still work just fine.
That said, I'm less pleased with it than I was on the bus this morning,
so I'll probably wrap the functions with C/Python bindings afterall.
> Why are you using strip_space? To prettify the HTML?
I meant to look up the actual option before I sent the e-mail, the
actual option being `remove_blank_text'. The issue is that MSReader
treats all whitespace in the markup stream as relevant. So markup which
is pretty-printed to be like this:
<div>
<span>Here is one span</span>
<span>followed by another</span>
</div>
Comes out rendered like:
Here is one span
followed by another.
I'm currently using `remove_blank_text' plus collapsing sequences of
whitespace to *un*pretty-print, but I'm going to try instead removing
whitespace-only `elem.text's, whitespace-only `elem.tail's of last
children, and whitespace-only `elem.tail's between `display: block'
elements.
Learning this also impacts how we do LIT-extraction. Right now
pretty-printing LIT markup uses `remove_blank_text' to make the markup
pretty-printable, which has the aforementioned property of deforming it
in some cases. I think the easiest, most general solution is to
"protect" any whitespace-only text with a <span/> tag. The only
downsides are that it makes the extraction somewhat unfaithful to the
source content, and can result in spurious extra <span/>s in books which
e.g., have a trailing space at the end of every paragraph.
-Marshall
Follow ups
References