← Back to team overview

cuneiform team mailing list archive

[Bug 439736] Re: Access violation in rling.dll (speller?)

 

Thanks for that. I did a bit more testing and I may have found some
clues.

In spelchk.c::selectobj() there is the following code at the start of
the function that sets the start and end of a "part":

 if (!(findpart (obj, part, obj->pos_part[ib], obj->pos_part[ie], &pi)))
  {
   pi = obj->part_max; /* part not found => consider last part         */
   ib = part[pi].begi;     /* the last part's beg index in obj->pos_part[] */
   ie = part[pi].endi;     /* the last part's end index in obj->pos_part[] */
  }
 if ( (pi==0) && (!(part[pi].word)) )
   goto No_selectobj;      /* not worth part                               */
 cur_part [ib] = obj->pos_part[ib];  /* copy last part beg */  <== ACCESS VIOLATION ... index ib >>> sizeof(cur_part)
 cur_part [ie] = obj->pos_part[ie];  /* copy last part end */

I traced the problem bitmap and findpart() returns FALSE so the body is
entered. The part at index obj->part_max looks like nonsense in the
debugger.. large random values that are out of range, and we end up with
ib >>>> MAX_WORD_SIZE (the size of the local cur_part[] array with the
access violation).

Looking at other sections of the spell checker they all inspect part[]
elements from 0 ... part_max-1 and insert new elements at part_max, so I
suspect that the initialization of pi might be wrong and could actually
be outside the range of the part[] array.

For example, in spelfun.c!::findpart() we have:

 INT pi;
 for (pi=0; pi<obj->part_max; pi++)  /* find the part in part[]            */
  {
   /* ... */
  }

And in spelset.c::setpart() have have some code that adds a new part at
position i = obj->part_max:

 i = obj->part_max;      /* new part index                                  */
 obj->part_max++;        /* one part more in part[]                         */
 memset (&(part[i]),0,sizeof(SPART)); /* part initial state        */
 /* ... */

In any event, the following change leeds to more proper initialization
of ib and ie and also avoids my crash:

 if (!(findpart (obj, part, obj->pos_part[ib], obj->pos_part[ie], &pi)))
  {
   pi = obj->part_max - 1; /* part not found => consider last part         */
   ib = part[pi].begi;         /* the last part's beg index in obj->pos_part[] */
   ie = part[pi].endi;         /* the last part's end index in obj->pos_part[] */
  }

I'm may be wrong though :( ... It would be good to get some developer
opinion on this.

-- 
Access violation in rling.dll (speller?)
https://bugs.launchpad.net/bugs/439736
You received this bug notification because you are a member of Cuneiform
Linux, which is the registrant for Cuneiform for Linux.

Status in Linux port of Cuneiform: New

Bug description:
We have used CuneiForm to OCR about 10 days of digital subtitles on 6 channels (about 100,000+ bitmaps). Recently I found one that crashes the spell checker and probably breaks subsequent recognition. I am attaching the bitmap for reference as we can use it to crash cuneiform.exe in isolation (ironically he says "Noooo!!!!"). I would be interested in seeing if others can reproduce this or if it is an issue with with the environment.

Note that we compile without _USE_RVERLINE_ as that module crashes consistently in some situations.

Compilation Environment:
MS VC++ 2003 (WinXP 32-bit) [DEBUG]
#undef _USE_RVERLINE_
(+ one line header patch to compile with VC2003).

Current Workaround:
Can avoid this crash by calling API with following, although initial problem may be outside speller:
Bool32 bSpeller = FALSE; /* FIXME: avoid crash on some bitmaps */
PUMA_SetImportData(PUMA_Bool32_Speller, &bSpeller);

Stack Trace:
~> cuneiform.exe -l eng -o out.txt ocr-crash.bmp
rling.dll!selectobj(objstr * obj=0x014331a0, short ibeg=9, partstr * part=0x0143f4e0)  Line 283 + 0x13
rling.dll!selectopt(objstr * obj=0x014331a0, partstr * part=0x0143f4e0)  Line 197 + 0x12
rling.dll!ed_conv(dict_state * dict=0x01431b48, user_voc * voc_array=0x014330c0, short voc_no=0)  Line 502 + 0xf
rling.dll!run_page()  Line 251 + 0x16
rling.dll!spelling(unsigned char * beg=0x021e29d0, long size=8192)  Line 201 + 0x5
rling.dll!CRLControl::CheckED(void * pEDPool=0x0183cca8, void * pEDOutPool=0x018449b0, unsigned int wEDPoolSize=406, unsigned int * pwEDOutPoolSize=0x00dae78c, int * pOut=0x00dae780)  Line 391 + 0x13
rling.dll!RLING_CheckED(void * pEDPool=0x0183cca8, void * pEDOutPool=0x018449b0, unsigned int wEDPoolSize=406, unsigned int * pwEDOutPoolSize=0x00dae78c, int * pOutCheck=0x00dae780)  Line 272 + 0x1f
rpstr.dll!rpstr_normal_spell(char * sec_wrd=0x00daeca8)  Line 1027 + 0x24
rpstr.dll!rpstr_correct_spell(void * ln=0x02340c98, strucCSTR_cell * * addbeg=0x00daf0c4, strucCSTR_cell * * addend=0x00daf0b8, int * linefrag=0x00daf0d4, int num_ln=1, int disable_new_dict=0, int disable_check_word=0)  Line 2115 + 0xc
rpstr.dll!correct_line_spell(void * line=0x02340c98, strucCSTR_cell * * re=0x00daf0b8, strucCSTR_cell * * rb=0x00daf0c4, int line_num=1, int disable_new_dict=0, int disable_check_word=0, int * rf=0x00daf0d4)  Line 362 + 0x21
rpstr.dll!RPSTR_CorrectSpell(int version=1)  Line 445 + 0x21
puma.dll!Recognize()  Line 706 + 0xa
puma.dll!PUMA_XFinalRecognition()  Line 600 + 0x5
cuneiform.exe!main(int argc=6, char * * argv=0x01d57758)  Line 378 + 0x8
cuneiform.exe!mainCRTStartup()  Line 398 + 0x11



Follow ups

References