cuneiform team mailing list archive
-
cuneiform team
-
Mailing list archive
-
Message #00577
[Bug 585418] [NEW] can produce hOCR with illegal UTF-8 sequences
Public bug reported:
Cuneiform can produce hOCR that contains illegal UTF-8 sequences:
$ cuneiform -l ruseng -f hocr -o test.html test.png
Cuneiform for Linux 0.9.0
$ grep -i utf-8 test.html
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >
$ iconv -f UTF-8 -t UTF-8 < test.html > /dev/null
iconv: illegal input sequence at position 401
** Affects: cuneiform-linux
Importance: Undecided
Status: New
--
can produce hOCR with illegal UTF-8 sequences
https://bugs.launchpad.net/bugs/585418
You received this bug notification because you are a member of Cuneiform
Linux, which is the registrant for Cuneiform for Linux.
Status in Linux port of Cuneiform: New
Bug description:
Cuneiform can produce hOCR that contains illegal UTF-8 sequences:
$ cuneiform -l ruseng -f hocr -o test.html test.png
Cuneiform for Linux 0.9.0
$ grep -i utf-8 test.html
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >
$ iconv -f UTF-8 -t UTF-8 < test.html > /dev/null
iconv: illegal input sequence at position 401
Follow ups
References