cuneiform team mailing list archive

Thread
Date

Re: Patch to extend hOCR output

To: cuneiform@xxxxxxxxxxxxxxxxxxx
From: Jussi Pakkanen <jpakkane@xxxxxxxxx>
Date: Fri, 20 Mar 2009 13:28:54 +0200
In-reply-to: <1402a13f0902220701o5ddfd47o45367e42853f7480@mail.gmail.com>

On Sun, Feb 22, 2009 at 5:01 PM, Dmitry Polevoy
<openocr.polevoy@xxxxxxxxx> wrote:

> The initial version of hOcr output was created by Rene Rebe (look at history
> of  \cuneiform-linux\cuneiform_src\Kern\rout\src\html.cpp) and I am not a
> specialist with html encoding format.

The UTF-8 encoding thing was added by me. The reason it always outputs
UTF-8 is that Unicode is the recommended encoding for HTML and it
covers all the letters so there is no need to add support for legacy
character sets. I guess we could change the html writer function so
that you can't pass output charset information to it. Currently the
only caller is the Cuneiform command line binary, which  always passes
UTF-8 as output format.

References

Patch to extend hOCR output
From: Dmitry Polevoy, 2009-02-22
Re: Patch to extend hOCR output
From: Yury V. Zaytsev, 2009-02-22
Re: Patch to extend hOCR output
From: Dmitry Polevoy, 2009-02-22