sikuli-driver team mailing list archive

Thread
Date

[Question #679003]: [1.1.4] IDE: OCR Tuning

To: sikuli-driver@xxxxxxxxxxxxxxxxxxx
From: Jan <question679003@xxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 06 Mar 2019 15:37:56 -0000
Reply-to: question679003@xxxxxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

New question #679003 on Sikuli:
https://answers.launchpad.net/sikuli/+question/679003

I am not a developer and absolute new to Tesseract. I tried to understand the Tesseract Documentation on GizHub, but it is not clear for me what functionality of Tesseract can(should be imported/used in SikuliX (e.g. "copy only the traineddata-files of your language into Sikulix AppData folder").
Now I want to read some foldernames in my Windows 10 Explorer when storing the output of a Web-App locally and, based on the OCR-result, change the folder or create a new subfolder. I assume that Windows 10 uses Segoe fonts. I have a German special sign in my root-folder path, the OCR-Result is: "Dieser PC > Lokaler DatentrÃ©ger(Cz) > ..."), This can also be a Sikulix Issue, but I can use a workaroud for this.

My Issue:
When embedded between a meaningless mixture of numbers and characters a lower "l" ( like Lima) allways(100%!!!) gets recognized as a pipe symbol ( | ). In addition more than 70% of upper "O" ( like "Oscar") in same scenario gets recognized as "0" (Zero) and vice versa the zero.
Zooming the size of characters in Windows Explorer to 150% didn't help. I am assuming the root cause in use different fonts.

My questions:
1. How can I tell Tesseract-OCR that it should try to recognize Segoe-fonts.
2. Until now I just added German traindata-files to the Tesseract folder. Are there some font sets to add?
3. Can I provide a blacklist of characters to Tesseract-OCR, saying that there will never be a pipe-symbol in the text.
4. What are the standard fonts of the current version of Tesseract wich is embedded in SikuliX 114. My idea is to switch (and switch back) the standard fonts of Windows Explorer compliant to Tesseract.

Thanks a lot in advance!

--
You received this question notification because your team Sikuli Drivers
is an answer contact for Sikuli.