← Back to team overview

kicad-developers team mailing list archive

Re: UTF8 source files

 

On Fri, May 03, 2013 at 09:43:29AM +0200, Edwin van den Oetelaar wrote:
> Flashback : It reminds me of the problems that existed with
> web-browsers, when people pasted stuff from their text-editor (like
> Word) into html textarea boxes to publish articles... oh the old days.

More or less the same... when the 'standard' wasn't Latin-1 but some
ANSI encoding ('smart quotes' often where the culprit)

> Maybe the UTF-8 should be 'escaped' using hex notation? How do other
> folks handle this?

It's just a policy thing to decide.

The current C/C++ standard allows to use the \U notation but AFAIK it
gives wide chars, not UTF8, since C is encoding agnostic. So you'd have
to *use* yourself the encoding in the string like

"This is mu:\xc2\xb5"

or, since I don't know if \x is standard or a gcc extension (need to
check)

"This is mu:\302\265"

Not exactly the most convenient thing to do, but keeps the sources in
strict ISO646 (ASCII). That is the 'multibyte' string approach.

The official C way (C++ too) would be to use wchars_t (which are
4 bytes big under Linux :D) and use

L"This is mu:\U00B5"

*If* you're using C++11 then you can say (C++11 is no more encoding
agnostic!)

u8"This is mu:\U00B5"

And have an UTF8 string.

So it's a choose-your-poison situation. More info here
http://en.cppreference.com/w/cpp/language/string_literal

Add to this the many ways that wx uses to handle string depending on
version, build option and (probably) the current phase of the moon.

-- 
Lorenzo Marcantonio
Logos Srl


Follow ups

References