cuneiform team mailing list archive
-
cuneiform team
-
Mailing list archive
-
Message #00278
Re: Patch to extend hOCR output
On Sun, Feb 22, 2009 at 5:01 PM, Dmitry Polevoy
<openocr.polevoy@xxxxxxxxx> wrote:
> The initial version of hOcr output was created by Rene Rebe (look at history
> of \cuneiform-linux\cuneiform_src\Kern\rout\src\html.cpp) and I am not a
> specialist with html encoding format.
The UTF-8 encoding thing was added by me. The reason it always outputs
UTF-8 is that Unicode is the recommended encoding for HTML and it
covers all the letters so there is no need to add support for legacy
character sets. I guess we could change the html writer function so
that you can't pass output charset information to it. Currently the
only caller is the Cuneiform command line binary, which always passes
UTF-8 as output format.
References