← Back to team overview

cuneiform team mailing list archive

hOCR (was: [Bug 623438] Re: Font size not correct in merged sandvich PDF)

 

On Fri, 10 Sep 2010  Martin Wildam <623438@xxxxxxxxxxxxxxxxxx> wrote:

> @Igor: I searched quite a while - don't remember ocrad explicitely now
> but I am quite sure I came across it. I also found at other places (blog
> posts) that cuneiform seems to be the only one producing hocr output.

This was never true.

For the present status cf. e.g.

    http://groups.google.com/group/hocr

Regards

JSB

-- 
                     ,   
dr hab. Janusz S. Bien, prof. UW -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - Warsaw University (Department of Formal Linguistics)
jsbien@xxxxxxxxx, jsbien@xxxxxxxxxxxx, http://fleksem.klf.uw.edu.pl/~jsbien/

-- 
Font size not correct in merged sandvich PDF
https://bugs.launchpad.net/bugs/623438
You received this bug notification because you are a member of Cuneiform
Linux, which is the registrant for Cuneiform for Linux.

Status in Linux port of Cuneiform: Invalid
Status in “exactimage” package in Ubuntu: New

Bug description:
After processing with Cuneiform for Linux 1.0.0 and hOCR to PDF converter, version 0.7.4 (should be the most current version) I get a sandvich pdf that looks nice until I select text.

See the sample 5AADFEE1-0000.* files in the attachment and the result.pdf.
The effect is shown in screen087.png

For another file (Test10pages.pdf) the effect is either worse - basically I cannot really select any more text to copy because I only can guess where to move with the mouse.

It looks like that the font size in the HTML is somehow not correct - I am not an expert, but this link might help you:
http://www.emdpi.com/fontsize.html





References