cuneiform team mailing list archive

Thread
Date

Re: Hocr output status and identified improvements.

To: julien <julien@xxxxxxxxxxxxxxxxxxx>
From: Jussi Pakkanen <jpakkane@xxxxxxxxx>
Date: Mon, 5 Oct 2009 10:30:18 +0300
Cc: "cuneiform@xxxxxxxxxxxxxxxxxxx" <cuneiform@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <F45E9C86B0E2A24AAA58A669F1672F2014CF0340@BL2PRD0102MB003.prod.exchangelabs.com>

On Fri, Oct 2, 2009 at 11:23 PM, julien <julien@xxxxxxxxxxxxxxxxxxx> wrote:

> New rev ready to be pulled from. I have tested the hocr output and it works fine.
> Now ocr_line folllows the standard according to the hocr ref from 2007 mentioned earlier.
> (E.g. the char bboxes are in ocr_cinfo, and the text line is in pure text as text content for the ocr_line tag).

Looks nice, thanks. However your editor seems to have mangled the
russian comments somehow. I get tons of lines like these:

-    // ?????????? ??? ????????
+    // ���������� ��� ��������

In case it does not get through properly, the first one has question
marks while the second one has Unicode unrepresentable symbol
characters. Could you look into fixing this?

References

Hocr output status and identified improvements.
From: julien, 2009-10-01
Re: Hocr output status and identified improvements.
From: Jussi Pakkanen, 2009-10-01
Re: Hocr output status and identified improvements.
From: julien, 2009-10-01
Re: Hocr output status and identified improvements.
From: julien, 2009-10-01
Re: Hocr output status and identified improvements.
From: julien, 2009-10-02