← Back to team overview

cuneiform team mailing list archive

Re: Hocr output status and identified improvements.

 

On Fri, Oct 2, 2009 at 11:23 PM, julien <julien@xxxxxxxxxxxxxxxxxxx> wrote:

> New rev ready to be pulled from. I have tested the hocr output and it works fine.
> Now ocr_line folllows the standard according to the hocr ref from 2007 mentioned earlier.
> (E.g. the char bboxes are in ocr_cinfo, and the text line is in pure text as text content for the ocr_line tag).

Looks nice, thanks. However your editor seems to have mangled the
russian comments somehow. I get tons of lines like these:

-    // ?????????? ??? ????????
+    // ���������� ��� ��������

In case it does not get through properly, the first one has question
marks while the second one has Unicode unrepresentable symbol
characters. Could you look into fixing this?



References