cuneiform team mailing list archive
  
  - 
     cuneiform team cuneiform team
- 
    Mailing list archive
  
- 
    Message #00635
  
 [Bug 623438] Re: Font size not correct in merged	sandvich PDF
  
I looked at the HTML - indeed there is no font height or size
information there. So I assume, the coordinates of the boxes are simply
inaccurate. - Or hocr2pdf is doing something wrong when merging the HTML
with the image...
When I select Text in the result PDF it looks like the box is a little too small (missing a piece above), but for the Test10pages.pdf the effect is far more extreme. See here: http://www.youtube.com/watch?v=0d8_T-vV_Ak
In that case it selects in reality the line above the line I really want to select (the "für" is recognized as "ii" which is a different story).
-- 
Font size not correct in merged sandvich PDF
https://bugs.launchpad.net/bugs/623438
You received this bug notification because you are a member of Cuneiform
Linux, which is the registrant for Cuneiform for Linux.
Status in Linux port of Cuneiform: New
Bug description:
After processing with Cuneiform for Linux 1.0.0 and hOCR to PDF converter, version 0.7.4 (should be the most current version) I get a sandvich pdf that looks nice until I select text.
See the sample 5AADFEE1-0000.* files in the attachment and the result.pdf.
The effect is shown in screen087.png
For another file (Test10pages.pdf) the effect is either worse - basically I cannot really select any more text to copy because I only can guess where to move with the mouse.
It looks like that the font size in the HTML is somehow not correct - I am not an expert, but this link might help you:
http://www.emdpi.com/fontsize.html
References