sikuli-driver team mailing list archive

Thread
Date

[Question #692958]: tesseract pattern not enforced?

To: sikuli-driver@xxxxxxxxxxxxxxxxxxx
From: matteoa <question692958@xxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 16 Sep 2020 14:15:50 -0000
Reply-to: question692958@xxxxxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

New question #692958 on Sikuli:
https://answers.launchpad.net/sikuli/+question/692958

Hello,
I'm trying to OCR a text field on the target that contains codes that have a pattern ( implemented as pattern file in tesseract terms):
P\n\n\n\n
C\n\n\n\n
B\n\n\n\n
U\n\n\n\n

In practice there is a letter that can be P or C, or B or U and then 4 more hex digits. 
The length is always exactly 5 char in total.

So, at least in my  intention with this pattern file, correct output would be, as examples:
P0123, P2EFD, C12EF, B2BCD and  so on.
Running the script I see that the vast majority of the output is as expected but I have also some results like PPB, PFF3,CC3 and so on.
Is there a way I can enforce more the adherence to the pattern I setup in Sikulix (Jython) like this:
OCR.globalOptions().variable("user_patterns_file", "C:\\Sikulix\\Util\\Code_OCR.Pattern")
OCR.globalOptions().variable("tessedit_char_whitelist", "PCBU0123456789ABCDEF")
OCR.globalOptions().variable("tessedit_char_blacklist", "abcdefGgHhIiLlMmNnOopQqRrSsTtuVvZzJjYyKkWw-!|")
OCR.globalOptions().variable("load_system_dawg", "F")
OCR.globalOptions().variable("load_freq_dawg", "F") 

Thanks in advance.
My configuration is:
2.0.4-2020-03-14_08:01/Windows10.0/Java8(64)1.8.0_251-b08



-- 
You received this question notification because your team Sikuli Drivers
is an answer contact for Sikuli.