sikuli-driver team mailing list archive
-
sikuli-driver team
-
Mailing list archive
-
Message #02101
[Bug 710586] Re: X 1.0rc2: Region.text() -- known problems and needed improvements
** Description changed:
******* this report is a summary of known problems
The text recognition feature (OCR - Region.text()) together with the
possibility to find text in an image is still experimental and under
developement.
This are currently reported bugs:
+ bug 735434: Text extraction from Images fails in some cases on colored backgrounds
bug 695616: Inconsistency in text recognition and matching, especially with integers-as-text!
bug 695650: find(text).text() does not return same text
bug 701005: text() always returns text with trailing x'200A20'
bug 701012: text() does not return all intervening blanks, add's others
Other experienced oddities
-- there are problems with text, that is not in english language
-- very small and very large fonts may not work
-- multiline text makes problems
-- intervening/preceding/trailing grafics and symbols are tried to be interpreted as text
Tip when using Region.text():
Currently you get the best results, when the region represents only one line of text and only contains text (no graphics/symbols) in english language. If you can influence it: make the text as large as possible.
-- additional information:
Internally the tesseract OCR engine (http://code.google.com/p/tesseract-ocr/) is used.
So their restrictions apply (e.g. minimum size of font, ...).
Information can be found on their Wiki.
--
You received this bug notification because you are a member of Sikuli
Drivers, which is subscribed to Sikuli.
https://bugs.launchpad.net/bugs/710586
Title:
X 1.0rc2: Region.text() -- known problems and needed improvements
Status in Sikuli:
In Progress
Bug description:
******* this report is a summary of known problems
The text recognition feature (OCR - Region.text()) together with the
possibility to find text in an image is still experimental and under
developement.
This are currently reported bugs:
bug 735434: Text extraction from Images fails in some cases on colored backgrounds
bug 695616: Inconsistency in text recognition and matching, especially with integers-as-text!
bug 695650: find(text).text() does not return same text
bug 701005: text() always returns text with trailing x'200A20'
bug 701012: text() does not return all intervening blanks, add's others
Other experienced oddities
-- there are problems with text, that is not in english language
-- very small and very large fonts may not work
-- multiline text makes problems
-- intervening/preceding/trailing grafics and symbols are tried to be interpreted as text
Tip when using Region.text():
Currently you get the best results, when the region represents only one line of text and only contains text (no graphics/symbols) in english language. If you can influence it: make the text as large as possible.
-- additional information:
Internally the tesseract OCR engine (http://code.google.com/p/tesseract-ocr/) is used.
So their restrictions apply (e.g. minimum size of font, ...).
Information can be found on their Wiki.
References