← Back to team overview

sikuli-driver team mailing list archive

Re: [Question #254041]: tessedit_char_whitelist

 

Question #254041 on Sikuli changed:
https://answers.launchpad.net/sikuli/+question/254041

    Status: Open => Solved

John Nilson confirmed that the question is solved:
I was able to solve this problem by taking the following steps:

1) Downloaded Tesseract so I could get the utilities to unpack the eng.traineddata file
2)Added the Tesseract directory to my executable path.
3)Switched directories to the Sikuli\libs\tessdata directory
4) copied eng.* files into a new "Unpacked" directory I created. Then ran unpacked:
C:\Program Files (x86)\Sikuli\libs\tessdata\Unpacked>combine_tessdata -u eng.traineddata ./eng2.
Extracting tessdata components from eng.traineddata
Wrote ./eng.config
Wrote ./eng.unicharset
Wrote ./eng2.unicharambigs
Wrote ./eng2.inttemp
Wrote ./eng.pffmtable
Wrote ./eng.normproto
Wrote ./eng.punc-dawg
Wrote ./eng.word-dawg
Wrote ./eng.number-dawg
Wrote ./eng.freq-dawg
Wrote ./eng.cube-unicharset
Wrote ./eng.cube-word-dawg
Wrote ./eng.shapetable
Wrote ./eng.bigram-dawg

5) Edited eng.config and added the line:
tessedit_char_whitelist abcdefghijklmnopqrtsuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789

6)created a new eng.traineddata file using the following command:
C:\Program Files (x86)\Sikuli\libs\tessdata\Unpacked>combine_tessdata eng.
Combining tessdata files
TessdataManager combined tesseract data files.
Offset for type 0 is 140
Offset for type 1 is 358
Offset for type 2 is 7643
Offset for type 3 is 8690
Offset for type 4 is 980283
Offset for type 5 is 981099
Offset for type 6 is 997382
Offset for type 7 is 1001704
Offset for type 8 is 2085898
Offset for type 9 is 2112548
Offset for type 10 is -1
Offset for type 11 is 2113958
Offset for type 12 is 2115469
Offset for type 13 is 3177575
Offset for type 14 is 3240921
Offset for type 15 is -1
Offset for type 16 is -1

7) copied over the existing eng.traineddata with the eng.traineddata I
had just created in the Unpacked directory

8) started Sikuli IDE and voila, I only read AlphaNumeric characters.

-- 
You received this question notification because you are a member of
Sikuli Drivers, which is an answer contact for Sikuli.