On Fri, May 03, 2013 at 09:43:29AM +0200, Edwin van den Oetelaar wrote:
> Flashback : It reminds me of the problems that existed with
> web-browsers, when people pasted stuff from their text-editor (like
> Word) into html textarea boxes to publish articles... oh the old days.

More or less the same... when the 'standard' wasn't Latin-1 but some
ANSI encoding ('smart quotes' often where the culprit)

> Maybe the UTF-8 should be 'escaped' using hex notation? How do other
> folks handle this?

It's just a policy thing to decide.

The current C/C++ standard allows to use the \U notation but AFAIK it
gives wide chars, not UTF8, since C is encoding agnostic. So you'd have
to *use* yourself the encoding in the string like

"This is mu:\xc2\xb5"

or, since I don't know if \x is standard or a gcc extension (need to

"This is mu:\302\265"

Not exactly the most convenient thing to do, but keeps the sources in
strict ISO646 (ASCII). That is the 'multibyte' string approach.

The official C way (C++ too) would be to use wchars_t (which are
4 bytes big under Linux :D) and use

L"This is mu:\U00B5"

*If* you're using C++11 then you can say (C++11 is no more encoding

u8"This is mu:\U00B5"

And have an UTF8 string.

So it's a choose-your-poison situation. More info here

Add to this the many ways that wx uses to handle string depending on
version, build option and (probably) the current phase of the moon.

Lorenzo Marcantonio
Logos Srl

