← Back to team overview

cuneiform team mailing list archive

Hocr and html - new version

 

The hocr/html output now passes the wc3 validation.
Some tags had not been nestled correctly, firefox didn't care but I discovered it when I wrote a parser.
The parser is in python and basically gets out textilnes, chars, strings, their bboxes and the typography.
If it is of interest I could put it in the branch (as bsd license), if so, suggestions of where in the src tree?

Regarding the russian comments that came out wrong:

I have fixed the comments. First time so probably good if someone could quickly skim and see if it seems alright.
I used:  iconv -f cp1251 -t utf8
on the original file, then copied in all comments, and then reversed: iconv -f utf8 -t cp1251
so now the file should be in cp1251.

Regards
Julien



Follow ups