← Back to team overview

cuneiform team mailing list archive

Re: charset mess inside source files

 

On Mon, Aug 25, 2008 at 12:18 PM, Alex Samorukov <samm@xxxxxxxxxxx> wrote:

> So, my proposal for utf-8 in engine is:
> 1) define PUMA_CODE_UTF8 inside headers
> 2) add function like from_ansi_to_utf8() inside codetables.cpp
> Place iconv code here with
> #ifdef HAVE_ICONV and HAVE_UTF8
> #endif
> and return error in case of iconv() absence.  Later,  in win32 engine, here
> should be very easy to add winnls code for this, so porting will not be a
> hard task. function from_ansi_to_utf8() will use defined PUMA_ characters as
> argument list, so even charset names will not be system depended.
> 3) Modify ROUT_ListCodes to return also utf-8 in case of HAVE_UTF8 is
> defined. Add calling of the from_ansi_to_utf8() in the rout library in case
> of PUMA_CODE_UTF8 selected and working.
> 4) Modify cli utility to call ListCodes and use PUMA_CODE_UTF8 in case of it
> presence.
> 5) Add checking of the ICONV presence to the cmake scripts, and define
> HAVE_UTF8 and HAVE_ICONV here. In windows it may just setup HAVE_WINNLS and
> HAVE_UTF8, and only system depended code will be in from_ansi_to_utf8()
> function.

Looks good. However what we are using ICONV for in step 2 is just as a
look-up table. Building that table from scratch removes the dependency
to Iconv and Win32 system libraries. If you don't want to do that
yourself I can fix that later. Therefore you also don't have to deal
with iconv #ifdefs, since eventually we won't need them.

But before starting you might want to look how the following thread
turns out, it seems there is already some Unicode support inside
Cuneiform.

http://openocr.org/forum/viewtopic.php?f=7&t=2829&p=3693#p3693



References