sikuli-driver team mailing list archive
-
sikuli-driver team
-
Mailing list archive
-
Message #25552
[Question #247833]: Tesseract implementation
New question #247833 on Sikuli:
https://answers.launchpad.net/sikuli/+question/247833
Hi
I'd like to improve the OCR functionality with my own training data. I have created a new traineddata-file which works fine with tesseract. When I rename the traineddata file for the language I created into eng.traineddata, and replaced the original file in tessdata with it, everything works fine. However, a cleaner way would be to add the option of adding languages to Sikuli, ie not hardcoding it to always use a language called "eng". How would one modify the source code to allow that?
I have found two occurances of where the "eng" language is specified:
* Line 19 in Tesseract4SikuliX\src\main\java\org\sikuli\tesseract\Run.java
* Line 366 in Natives\src\main\native\Vision\tessocr.cpp
However, despite changing "eng" to the language name of my new file, I still get the same result. It seems that Sikuli continues to use eng.traneddata, despite me telling Sikuli to use a different language. Is the language to use defined somewhere else that I have missed?
In the long run, I hope to be able to make a contribution for Sikuli, where the user can easily add their own languages/trained data for OCR.
Best regards
--
You received this question notification because you are a member of
Sikuli Drivers, which is an answer contact for Sikuli.