cuneiform team mailing list archive

Thread
Date

[Bug 502483] Re: hOCR output missing image name

To: cuneiform@xxxxxxxxxxxxxxxxxxx
From: Polevoy Dmitry <openocr.polevoy@xxxxxxxxx>
Date: Sun, 03 Jan 2010 09:43:04 -0000
Reply-to: Bug 502483 <502483@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Look at \cuneiform-linux\cuneiform_src\cli\cuneiform-cli.cpp and try use
appropriate value for the second parameter in PUMA_XOpen function call.
Now it use dummy value  "none.txt" (PUMA_XOpen(dib, "none.txt")).

-- 
hOCR output missing image name
https://bugs.launchpad.net/bugs/502483
You received this bug notification because you are a member of Cuneiform
Linux, which is the registrant for Cuneiform for Linux.

Status in Linux port of Cuneiform: New

Bug description:
In hOCR files, the ocr_page class tags have incorrect source image information, like below:

<div class='ocr_page' id='page_1' title='image "none.txt"; bbox 0 0 2816 2112'>

Instead of "none.txt", it should report the original source file (as a URL but it can be relative, so a bare filename should suffice, afaik). Even though hocr2pdf from exactimage doesn't use this info, other tools rely on it.

References

[Bug 502483] [NEW] hOCR output missing image name
From: MM, 2010-01-03