kicad-developers team mailing list archive
-
kicad-developers team
-
Mailing list archive
-
Message #10290
Re: UTF8 source files
On Fri, May 03, 2013 at 09:43:29AM +0200, Edwin van den Oetelaar wrote:
> Flashback : It reminds me of the problems that existed with
> web-browsers, when people pasted stuff from their text-editor (like
> Word) into html textarea boxes to publish articles... oh the old days.
More or less the same... when the 'standard' wasn't Latin-1 but some
ANSI encoding ('smart quotes' often where the culprit)
> Maybe the UTF-8 should be 'escaped' using hex notation? How do other
> folks handle this?
It's just a policy thing to decide.
The current C/C++ standard allows to use the \U notation but AFAIK it
gives wide chars, not UTF8, since C is encoding agnostic. So you'd have
to *use* yourself the encoding in the string like
"This is mu:\xc2\xb5"
or, since I don't know if \x is standard or a gcc extension (need to
check)
"This is mu:\302\265"
Not exactly the most convenient thing to do, but keeps the sources in
strict ISO646 (ASCII). That is the 'multibyte' string approach.
The official C way (C++ too) would be to use wchars_t (which are
4 bytes big under Linux :D) and use
L"This is mu:\U00B5"
*If* you're using C++11 then you can say (C++11 is no more encoding
agnostic!)
u8"This is mu:\U00B5"
And have an UTF8 string.
So it's a choose-your-poison situation. More info here
http://en.cppreference.com/w/cpp/language/string_literal
Add to this the many ways that wx uses to handle string depending on
version, build option and (probably) the current phase of the moon.
--
Lorenzo Marcantonio
Logos Srl
Follow ups
References