← Back to team overview

cuneiform team mailing list archive

[Bug 623438] Re: Font size not correct in merged sandvich PDF

 

Same problem here on ubuntu 10.04 with cuneiform 1.0.0 and hocr2pdf
0.7.4. I compared the information in the hocr file with the position of
the text in the pdf, and whatever hocr2pdf does, the text in the pdf
doesn't match the boundingboxes defined in the .hocr file. So i think
this is a problem of hocr2pdf. Not sure if this is related to how the
hocr output of cuneiform is formatted, as i have read that there are
many ways to attach the boundingboxes to the text (using own tags, using
attributes in the tag enclosing the text directly, ...). Would be nice
to know if hocr2pdf can handle hocr output from other ocr engines atm,
and if so, where their hocr files are different to the cuneiform output.

-- 
Font size not correct in merged sandvich PDF
https://bugs.launchpad.net/bugs/623438
You received this bug notification because you are a member of Cuneiform
Linux, which is the registrant for Cuneiform for Linux.

Status in Linux port of Cuneiform: New

Bug description:
After processing with Cuneiform for Linux 1.0.0 and hOCR to PDF converter, version 0.7.4 (should be the most current version) I get a sandvich pdf that looks nice until I select text.

See the sample 5AADFEE1-0000.* files in the attachment and the result.pdf.
The effect is shown in screen087.png

For another file (Test10pages.pdf) the effect is either worse - basically I cannot really select any more text to copy because I only can guess where to move with the mouse.

It looks like that the font size in the HTML is somehow not correct - I am not an expert, but this link might help you:
http://www.emdpi.com/fontsize.html





References