cuneiform team mailing list archive
-
cuneiform team
-
Mailing list archive
-
Message #00673
[Bug 623438] Re: Font size not correct in merged sandvich PDF
I am not aware of any open source OCR software that is doing multi-
column document recognition. It's more of a segmentation task, rather
than recognition itself, so it should be rather implemented in a front-
end, such as OCRopus. If you have a linear text flow, sandwich PDFs can
be read by a screen reader smart enough in a reasonable way.
Apart from already mentioned Finereader, old Cuneiform Windows freeware
seems to be able to do multi-column.
--
Font size not correct in merged sandvich PDF
https://bugs.launchpad.net/bugs/623438
You received this bug notification because you are a member of Cuneiform
Linux, which is the registrant for Cuneiform for Linux.
Status in Linux port of Cuneiform: Invalid
Status in “exactimage” package in Ubuntu: New
Bug description:
After processing with Cuneiform for Linux 1.0.0 and hOCR to PDF converter, version 0.7.4 (should be the most current version) I get a sandvich pdf that looks nice until I select text.
See the sample 5AADFEE1-0000.* files in the attachment and the result.pdf.
The effect is shown in screen087.png
For another file (Test10pages.pdf) the effect is either worse - basically I cannot really select any more text to copy because I only can guess where to move with the mouse.
It looks like that the font size in the HTML is somehow not correct - I am not an expert, but this link might help you:
http://www.emdpi.com/fontsize.html
Follow ups
References