cuneiform team mailing list archive
-
cuneiform team
-
Mailing list archive
-
Message #00382
Hocr and html - new version
The hocr/html output now passes the wc3 validation.
Some tags had not been nestled correctly, firefox didn't care but I discovered it when I wrote a parser.
The parser is in python and basically gets out textilnes, chars, strings, their bboxes and the typography.
If it is of interest I could put it in the branch (as bsd license), if so, suggestions of where in the src tree?
Regarding the russian comments that came out wrong:
I have fixed the comments. First time so probably good if someone could quickly skim and see if it seems alright.
I used: iconv -f cp1251 -t utf8
on the original file, then copied in all comments, and then reversed: iconv -f utf8 -t cp1251
so now the file should be in cp1251.
Regards
Julien
Follow ups