sikuli-driver team mailing list archive

Thread
Date

Re: [Question #660398]: [1.1.x] Changing OCR language from English to something else

To: sikuli-driver@xxxxxxxxxxxxxxxxxxx
From: RaiMan <question660398@xxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 07 Nov 2017 09:10:31 -0000
Reply-to: question660398@xxxxxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

Question #660398 on Sikuli changed:
https://answers.launchpad.net/sikuli/+question/660398

    Status: Open => Answered

RaiMan proposed the following answer:
ok, again thanks for not giving up.

My bad: if I would have checked instead of guessing, I would have
realized, that Tesseract 3.02 is the correct choice.

I have corrected the answer and faq 2709 accordingly.

Since beginning with Tesseract 3 read text is returned as unicode
string, staying with SikuliX 1.1.0 makes problems, since the contained
Jython 2.5 does not recognize unicode strings.

I recommend, to upgrade to SikuliX 1.1.1 which has Jython 2.7 (unicode aware).
Additionally using Java 7 or even Java 8 (not Java 9 yet!) would be a good choice.

I made a test with german language like this:
import org.sikuli.script.TextRecognizer as TR
Settings.OcrReadText = True
Settings.OcrLanguage = "deu"
TR.reset()

text = selectRegion().text()
uprint(text) # normal print not unicode aware
popup(text) # unicode aware

which worked as expected and printed the "german umlauts" ä, ü, ö

uprint() is a SikuliX helper function, that internally makes unicode
strings printable and can be used like the print statement.

-- 
You received this question notification because your team Sikuli Drivers
is an answer contact for Sikuli.