cuneiform team mailing list archive
-
cuneiform team
-
Mailing list archive
-
Message #00036
Re: PDF output
On Tue, Sep 2, 2008 at 2:22 PM, René Rebe <rene.rebe@xxxxxxxxx> wrote:
> I plan to add PDF writing to create searchable PDFs from cuneiform on Linux.
>
> So far I "only" did some light scrolling and grep'ing back and force over
> the code, and as I have not yet fully memorized it's structure I wanted to
> ask the ones already more familiar with the code before I start about the
> best place to add such code.
>
> So far I identified: cuneiform_src/Kern/rout/src/
See function PUMA_Save in puma.cpp. There you have a switch/case
structure for every output format. This is where you would add a
branch for your own format.
> In which I would start by making a copy of html.cpp to add the corresponding
> PDF tag writeouts, using ExactImage
> (http://www.exactcode.de/site/open_source/exactimage/)
> for the actual PDF structure generation. (ExactImage SVN:HEAD only includes
> very static pure image writing, but I already rewrote that part and have any
> vector, font, image and multi-page writing in my local working copy, already).
>
> Any hints welcome,
Exactimage is GPL code. Linking to it is legal but would contaminate
Cuneiform (which is BSD). For this reason I can't accept it into
trunk.
I recommend that you look into Cairo
(http://annarchy.cairographics.org/), which is LGPL.
I also have a strict policy about external libraries. I want to keep
the code compilable, working and fully functional on vanilla Win32
systems using both MinGW and MSVC (even though it does not currently
do so). So if you want to add PDF export, you need to #ifdef it
cleanly.
The easiest way to get PDF output is to convert the RTF output to PDF.
I would imagine that there are already programs that do this. I'm also
looking into adding the layout information to the HTML exporter using
hOCR format. Having a hOCR -> PDF converter would probably be
beneficial outside Cuneiform as well.
References