cuneiform team mailing list archive
-
cuneiform team
-
Mailing list archive
-
Message #00476
[Bug 502483] [NEW] hOCR output missing image name
Public bug reported:
In hOCR files, the ocr_page class tags have incorrect source image
information, like below:
<div class='ocr_page' id='page_1' title='image "none.txt"; bbox 0 0 2816
2112'>
Instead of "none.txt", it should report the original source file (as a
URL but it can be relative, so a bare filename should suffice, afaik).
Even though hocr2pdf from exactimage doesn't use this info, other tools
rely on it.
** Affects: cuneiform-linux
Importance: Undecided
Status: New
--
hOCR output missing image name
https://bugs.launchpad.net/bugs/502483
You received this bug notification because you are a member of Cuneiform
Linux, which is the registrant for Cuneiform for Linux.
Status in Linux port of Cuneiform: New
Bug description:
In hOCR files, the ocr_page class tags have incorrect source image information, like below:
<div class='ocr_page' id='page_1' title='image "none.txt"; bbox 0 0 2816 2112'>
Instead of "none.txt", it should report the original source file (as a URL but it can be relative, so a bare filename should suffice, afaik). Even though hocr2pdf from exactimage doesn't use this info, other tools rely on it.
Follow ups
References