On 11/21/2013 02:16 PM, Dick Hollenbeck wrote:
1) wx >= 2.9 has these constructors
wxString( const char* )
wxString( std::string )
whereas wx 2.8 does not.
Both offer:
wxString( const char*, wxConvUTF8 );
but this cannot be used in a default "type promotion" situation, this constructor must be
invoked explicitly.
2) The above type promotion constructors treat the input encoding as that of the current
locale, rather than UTF8 assuredly.
The type promotion constructors are important if you want to allow the compiler to promote
an 8 bit string to a wxString for you without special syntax.
3) If you decide to keep 8 bit strings in memory, encoded in the current locale, then
someday when you load a chinese board file, you will not be able to hold those strings in
a deficient 8 bit encoding. (UTF8 is not a deficient 8 bit encoding, some others are.)
The software breaks at that point. This argues for using UTF8 always as the internal 8
bit encoding. But now the above two constructors are broken, since the current locale's
encoding cannot be assumed to be UTF8, even though it often is on linux. You just cannot
assume it.
In summary, I don't see any easy immediate relief from the boat anchor we know as
wxString, even with wx 3.0. But I will continue to think about it.
Dick
Attached is a patch needing a good look, that shows off a new class UTF8 that I wrote that
solves the problems addressed above by providing conversion operators to and from
wxString, yet holding UTF8 data in what is basically a std::string.
Please say how it impacts you, realizing its usage scope can be trimmed or expanded from
this sampling.
I am especially interested in:
a) how it compiles on gcc >= 4.8
b) how it compiles using clang.
c) what it does to any benchmarks of sane-ness and speed for stroke_font.h
Lorenzo, Marco, Orson, your feedback in particular is wanted.
class UTF8 will likely allow the removal of many many more calls to TO_UTF8() and
FROM_UTF8(), not in this patch.
Plus code size will likely be reduced because I put the size expensive stuff out of line
in a lean call interface.
Dick