sikuli-driver team mailing list archive
-
sikuli-driver team
-
Mailing list archive
-
Message #36793
Re: [Question #285002]: Tesseract with CJK and 1.1.1
Question #285002 on Sikuli changed:
https://answers.launchpad.net/sikuli/+question/285002
Barry Janzen posted a new comment:
In Sikuli, I get the "Can't load any languages" if my script does
Settings.OcrTextSearch = True
Settings.OcrTextRead = True
Settings.OcrLanguage = 'jpn'
TR.reset()
So I used brew to install tesseract and see if I could get it to read a
png image of the web page https://news.google.com/news?ned=jp. After
FAILED attempts that looked like
tesseract -l jpn /tmp/jpn.png /tmp/jpn-out
I read somewhere that the TESSDATA_PREFIX was the critical piece. So I
copied my jpn.traineddata to the right spot and ran:
export
TESSDATA_PREFIX=/usr/local/Cellar/tesseract/3.04.00/share/;./tesseract
-l jpn /tmp/jpn.png /tmp/jpn-out
and it WORKED! (aside - In my .bash_profile, it was set to Sikuli's
tesseract, so I overrode it on the cmd line). So back to Sikuli. I
tried running my script with a couple of iterations, using both the
Sikuli tessdata directory and the brew tessdata directory.
As soon as I add the
Settings.OcrLanguage = 'jpn'
in my script, it throws the "Tesseract couldn't load any languages!"
error, which I can reproduce in the brew install if I give it an invalid
TESSDATA_PREFIX directory. For example:
export TESSDATA_PREFIX=/tmp;./tesseract -l jpn /tmp/jpn.png /tmp/jpn-out
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'jpn'
Tesseract couldn't load any languages!
Could not initialize tesseract.
Since it DOES work in English, it must mean that we are getting the
directory. For example, if I use the invalid directory with brew
tesseract in English, it complains:
export TESSDATA_PREFIX=/tmp;tesseract /tmp/tess-test.png /tmp/tess-out2
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
So we have the right directory. It's just that Settings.OcrLanguage =
'jpn' is not doing the right thing.
Hope this helps.
--
You received this question notification because your team Sikuli Drivers
is an answer contact for Sikuli.