sikuli-driver team mailing list archive

Thread
Date

[Question #247833]: Tesseract implementation

To: sikuli-driver@xxxxxxxxxxxxxxxxxxx
From: sjoblomj <question247833@xxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 28 Apr 2014 13:11:43 -0000
Reply-to: question247833@xxxxxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

New question #247833 on Sikuli:
https://answers.launchpad.net/sikuli/+question/247833

Hi

I'd like to improve the OCR functionality with my own training data. I have created a new traineddata-file which works fine with tesseract. When I rename the traineddata file for the language I created into eng.traineddata, and replaced the original file in tessdata with it, everything works fine. However, a cleaner way would be to add the option of adding languages to Sikuli, ie not hardcoding it to always use a language called "eng". How would one modify the source code to allow that?

I have found two occurances of where the "eng" language is specified: 
* Line 19 in Tesseract4SikuliX\src\main\java\org\sikuli\tesseract\Run.java
* Line 366 in Natives\src\main\native\Vision\tessocr.cpp

However, despite changing "eng" to the language name of my new file, I still get the same result. It seems that Sikuli continues to use eng.traneddata, despite me telling Sikuli to use a different language. Is the language to use defined somewhere else that I have missed? 

In the long run, I hope to be able to make a contribution for Sikuli, where the user can easily add their own languages/trained data for OCR.

Best regards

-- 
You received this question notification because you are a member of
Sikuli Drivers, which is an answer contact for Sikuli.