cuneiform team mailing list archive
-
cuneiform team
-
Mailing list archive
-
Message #00376
Re: Hocr output status and identified improvements.
On Fri, Oct 2, 2009 at 11:23 PM, julien <julien@xxxxxxxxxxxxxxxxxxx> wrote:
> New rev ready to be pulled from. I have tested the hocr output and it works fine.
> Now ocr_line folllows the standard according to the hocr ref from 2007 mentioned earlier.
> (E.g. the char bboxes are in ocr_cinfo, and the text line is in pure text as text content for the ocr_line tag).
Looks nice, thanks. However your editor seems to have mangled the
russian comments somehow. I get tons of lines like these:
- // ?????????? ??? ????????
+ // ���������� ��� ��������
In case it does not get through properly, the first one has question
marks while the second one has Unicode unrepresentable symbol
characters. Could you look into fixing this?
References