← Back to team overview

sikuli-driver team mailing list archive

[Question #160874]: New (norwegian) tesseract training set crashes Sikuli?

 

New question #160874 on Sikuli:
https://answers.launchpad.net/sikuli/+question/160874

Hi,

I'm having some problems getting a tesseract training set for norwegian to work in Sikuli.

The training set was created for tesseract 2.04 as described here:
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract2

My training set works with tesseract, but when I exchange the english training set in the sikuli-script.jar with my training set, Sikuli crashes whenever I try to do image captures or try to get the text in an image. Since my training set includes non-english characters (æ,ø,å), I was wondering if this is the reason Sikuli crashes. Or is there another "proper" way of doing it?

The files I've exchanged (with identically named files) are:
/tessdata/eng.freq-dawg
/tessdata/eng.inttemp
/tessdata/eng.normproto
/tessdata/eng.pffmtable
/tessdata/eng.unicharset
/tessdata/eng.user-words
/tessdata/eng.word-dawg
/tessdata/eng.DangAmbigs

Happens on Sikuli X.RC2 on both ubuntu and windows vista.

-- 
You received this question notification because you are a member of
Sikuli Drivers, which is an answer contact for Sikuli.