sikuli-driver team mailing list archive
-
sikuli-driver team
-
Mailing list archive
-
Message #05857
[Bug 710586] Re: X 1.0rc3: Region.text() -- known problems and needed improvements
***** from a post on the mailing list sikuli-dev by macs
Is the latest Sikuli migrated to tesseract3? I see a branch name as
tesseract3 in git hub. I see many issues regarding OCR being discussed
in launchpad.
In my understanding OCR results can be improved by pre-processing of images
1. Convert image to gray scale.
2. Improve contrast or apply edge detection filters.
3. inverting colors or negative
4. Reducing the color depth.
5. Apply image smoothing filters.
All filters may not be applicable for all types of images. User might
want to improve a filter or a combination of filter to achieve better
results. Can we give this option to user?
I was not sure if any of the pre processing was done in the RC2 release.
I tried to modify the function "doFind(PSC ptn)" in region.java to
convert image to grayscale before OCR processing. But I could not see
any improvement in OCR. I did not try further because my eclipse
environment is not setup completely. Does Sikuli do any pre-processing
of image before calling the OCR?
It would be nice if you can have the following support for OCR in Sikuli
1. Option for user to select language (Already requested)
2. Tesseract supports training and creation of box files. We should have a option to select user trained files.
3. There are many commercial OCR tools which has higher accuracy and better support for other languages. If the Sikuli OCR design can be modular (as defined in blueprint), user should be able to use other OCR.
Other observations in the current OCR
1. The OCR can recognize the text but the click fails.
If a screen has text "Search" and if I try click("Search") the click returns failure. But when I try to get the text in the screen using the text() api and print the text, it will print all the strings including the string "Search".
May be I think we need some improvement in searching the string of text returned by OCR.
--
You received this bug notification because you are a member of Sikuli
Drivers, which is subscribed to Sikuli.
https://bugs.launchpad.net/bugs/710586
Title:
X 1.0rc3: Region.text() -- known problems and needed improvements
Status in Sikuli:
In Progress
Bug description:
******* this report is a summary of known problems and feature
requests
The text recognition feature (OCR - Region.text()) together with the
possibility to find text in an image is still experimental and under
developement.
This are currently reported bugs:
bug 777660: text recognition errors with some fonts
bug 783082: [request] want font parameters for text recognition
bug 735434: Text extraction from Images fails in some cases on colored backgrounds
bug 695616: Inconsistency in text recognition and matching, especially with integers-as-text!
bug 695650: find(text).text() does not return same text
bug 701005: text() always returns text with trailing x'200A20'
bug 701012: text() does not return all intervening blanks, add's others
bug 795391: [request] OCR/tesseract: allow new training sets for other languages and more tesseract features
Other experienced oddities
-- there are problems with text, that is not in english language
-- very small and very large fonts may not work
-- multiline text makes problems
-- intervening/preceding/trailing grafics and symbols are tried to be interpreted as text
Tip when using Region.text():
Currently you get the best results, when the region represents only one line of text and only contains text (no graphics/symbols) in english language. If you can influence it: make the text as large as possible.
-- additional information:
Internally the tesseract OCR engine (http://code.google.com/p/tesseract-ocr/) is used.
So their restrictions apply (e.g. minimum size of font, ...).
Information can be found on their Wiki.
To manage notifications about this bug go to:
https://bugs.launchpad.net/sikuli/+bug/710586/+subscriptions
References