← Back to team overview

sikuli-driver team mailing list archive

Re: [Question #680605]: Tesseract reads most numbers correctly but not all. how to improve ?

 

Question #680605 on Sikuli changed:
https://answers.launchpad.net/sikuli/+question/680605

MP gave more information on the question:
https://stackoverflow.com/questions/4944830/how-to-make-tesseract-to-
recognize-only-numbers-when-they-are-mixed-with-letter

1) So I found the settings I need that will solve my problem (i think).

I want that tesseract only uses numbers:

But again I have trouble using the right syntax in sikulix.

tr.setoutputbase digits (not working)

2) Another solution is:

===============================
 made it a bit different (with tess-two). Maybe it will be useful for somebody.

So you need to initialize first the API.

TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(datapath, language, ocrEngineMode);

Then set the following variables

baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SINGLE_LINE);
baseApi.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST, "!?@#$%&*()<>_-+=/:;'\"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, ".,0123456789");
baseApi.setVariable("classify_bln_numeric_mode", "1");
====================================

Yet again I cant seem to get the syntax right. How do I let tesseract
know in sikulix the baseApi commands?

-- 
You received this question notification because your team Sikuli Drivers
is an answer contact for Sikuli.