kicad-developers team mailing list archive

Thread
Date

Re: UTF8 source files

To: Kicad Developers <kicad-developers@xxxxxxxxxxxxxxxxxxx>
From: Lorenzo Marcantonio <l.marcantonio@xxxxxxxxxxxx>
Date: Fri, 3 May 2013 09:58:58 +0200
In-reply-to: <CAMYMTFj32gO6=9QNYS_riujD3zMXM703CX98_uO7bk-8e9Pftw@mail.gmail.com>
Mail-followup-to: Kicad Developers <kicad-developers@xxxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)

On Fri, May 03, 2013 at 09:43:29AM +0200, Edwin van den Oetelaar wrote:
> Flashback : It reminds me of the problems that existed with
> web-browsers, when people pasted stuff from their text-editor (like
> Word) into html textarea boxes to publish articles... oh the old days.

More or less the same... when the 'standard' wasn't Latin-1 but some
ANSI encoding ('smart quotes' often where the culprit)

> Maybe the UTF-8 should be 'escaped' using hex notation? How do other
> folks handle this?

It's just a policy thing to decide.

The current C/C++ standard allows to use the \U notation but AFAIK it
gives wide chars, not UTF8, since C is encoding agnostic. So you'd have
to *use* yourself the encoding in the string like

"This is mu:\xc2\xb5"

or, since I don't know if \x is standard or a gcc extension (need to
check)

"This is mu:\302\265"

Not exactly the most convenient thing to do, but keeps the sources in
strict ISO646 (ASCII). That is the 'multibyte' string approach.

The official C way (C++ too) would be to use wchars_t (which are
4 bytes big under Linux :D) and use

L"This is mu:\U00B5"

*If* you're using C++11 then you can say (C++11 is no more encoding
agnostic!)

u8"This is mu:\U00B5"

And have an UTF8 string.

So it's a choose-your-poison situation. More info here
http://en.cppreference.com/w/cpp/language/string_literal

Add to this the many ways that wx uses to handle string depending on
version, build option and (probably) the current phase of the moon.

-- 
Lorenzo Marcantonio
Logos Srl

Follow ups

Re: UTF8 source files
From: Edwin van den Oetelaar, 2013-05-03

References

UTF8 source files
From: Dick Hollenbeck, 2013-05-03
Re: UTF8 source files
From: Edwin van den Oetelaar, 2013-05-03
Re: UTF8 source files
From: Lorenzo Marcantonio, 2013-05-03
Re: UTF8 source files
From: Edwin van den Oetelaar, 2013-05-03