sikuli-driver team mailing list archive

Thread
Date

Re: [Question #271904]: OCR manually improve

To: sikuli-driver@xxxxxxxxxxxxxxxxxxx
From: "Edmundo V. Neto" <question271904@xxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 01 Oct 2015 04:33:29 -0000
Reply-to: question271904@xxxxxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

Question #271904 on Sikuli changed:
https://answers.launchpad.net/sikuli/+question/271904

Edmundo V. Neto proposed the following answer:
Some time ago I needed to do something similar to what you want as I
asked here: https://answers.launchpad.net/sikuli/+question/263287,
Eugene suggested to kind of make my own OCR. The result is what I
answered last, it takes 0.3 seconds to solve a phrase with 100%
accuracy. Its not simple, it was made to recognise the text of a
combobox, not to recognize an entire page, its intolerant to changes
(the text rendered in the screen of a specific configuration is
different from another, for example the font in my screen is different
from the font in the exact same system inside virtualbox). I made a
module, made a Fireworks image with character slices (its very specific
to a screenshot of my program) to ease the cut and save work of each
character, each character image name is its unicode number .png, the
code only takes into account how paths are composed in Linux and its
written with comments and variable names in brazilian portuguese. I made
moreless what Eugene suggested, I process an area and try to find each
character inside it, make a list of each occurrence of each character
saving the character and its center position but I don't handle spaces.
I sort the list by position, subtract the first position from the last
to have the size of the text found and only inside that area I search
for spaces, add them to the list, sort by center position again and then
process a .py file inside a subdirectory that have the name of the font
it holds, this is a set of "rules" to apply to the list, for example, if
I process a dialog font, r with n and r with m have the exact same
partial picture depending how I cut it, a r cannot have a blank pixel at
the right with that font.

(I work with the centers)
If a r is found before a n with 1 pixel difference the r doesn't exist
If a r is found before a m with 3 pixels difference the r doesn't exist
( a n is never found inside a m because of the blank pixel at the right, this rule is not needed)

If you compare this with "region.right(xxx).text()" it seems overkill,
but with my dialog 11pt font it have 100% accuracy. Unfortunately I
don't know how to teach Tesseract to do that.

-- 
You received this question notification because your team Sikuli Drivers
is an answer contact for Sikuli.