← Back to team overview

sikuli-driver team mailing list archive

[Bug 710586] Re: X 1.0rc3: Region.text() -- known problems and needed improvements

 

***** from a post on the mailing list sikuli-dev by macs

Is the latest Sikuli migrated to tesseract3? I see a branch name as
tesseract3 in git hub. I see many issues regarding OCR being discussed
in launchpad.

In my understanding OCR results can be improved by pre-processing of images 
1. Convert image to gray scale.
2. Improve contrast or apply edge detection filters.
3. inverting colors or negative 
4. Reducing the color depth.
5. Apply image smoothing filters.

All filters may not be applicable for all types of images. User might
want to improve a filter or a combination of filter to achieve better
results. Can we give this option to user?

I was not sure if any of the pre processing was done in the RC2 release.
I tried to modify the function "doFind(PSC ptn)" in  region.java to
convert image to grayscale before OCR processing. But I could not see
any improvement in OCR. I did not try further because my eclipse
environment is not setup completely. Does Sikuli do any pre-processing
of image before calling the OCR?

It would be nice if you can have the following support for OCR in Sikuli

1. Option for user to select language (Already requested)
2. Tesseract supports training and creation of box files. We should have a option to select user trained files.
3. There are many commercial OCR tools which has higher accuracy and better support for other languages. If the Sikuli OCR design can be modular (as defined in blueprint), user should be able to use other OCR.

Other observations in the current OCR

1. The OCR can recognize the text but the click fails.
    If a screen has text "Search"  and if I try click("Search") the click returns failure. But when I try to get the text in the screen using the text() api and print the text, it will print all the strings including the string "Search".
    May be I think we need some improvement in searching the string of text returned by OCR.

-- 
You received this bug notification because you are a member of Sikuli
Drivers, which is subscribed to Sikuli.
https://bugs.launchpad.net/bugs/710586

Title:
  X 1.0rc3: Region.text() -- known problems and needed improvements

Status in Sikuli:
  In Progress

Bug description:
  ******* this report is a summary of known problems and feature
  requests

  The text recognition feature (OCR - Region.text()) together with the
  possibility to find text in an image is still experimental and under
  developement.

  This are currently reported bugs:
  bug 777660: text recognition errors with some fonts
  bug 783082: [request] want font parameters for text recognition
  bug 735434: Text extraction from Images fails in some cases on colored backgrounds
  bug 695616: Inconsistency in text recognition and matching, especially with integers-as-text!
  bug 695650: find(text).text() does not return same text
  bug 701005: text() always returns text with trailing x'200A20'
  bug 701012: text() does not return all intervening blanks, add's others
  bug 795391: [request] OCR/tesseract: allow new training sets for other languages and more tesseract features

  Other experienced oddities
  -- there are problems with text, that is not in english language
  -- very small and very large fonts may not work
  -- multiline text makes problems
  -- intervening/preceding/trailing grafics and symbols are tried to be interpreted as text

  Tip when using Region.text():
  Currently you get the best results, when the region represents only one line of text and only contains text (no graphics/symbols) in english language. If you can influence it: make the text as large as possible.

  -- additional information:
  Internally the tesseract OCR engine (http://code.google.com/p/tesseract-ocr/) is used.
  So their restrictions apply (e.g. minimum size of font, ...).
  Information can be found on their Wiki.

To manage notifications about this bug go to:
https://bugs.launchpad.net/sikuli/+bug/710586/+subscriptions


References