← Back to team overview

kicad-developers team mailing list archive

Re: The problems with wxString

 

On 01/01/2014 07:58 AM, Dick Hollenbeck wrote:
On 11/21/2013 02:16 PM, Dick Hollenbeck wrote:

1) wx >= 2.9 has these constructors


	wxString( const char* )
  	wxString( std::string )

whereas wx 2.8 does not.

Both offer:

wxString( const char*, wxConvUTF8 );

but this cannot be used in a default "type promotion" situation, this constructor must be
invoked explicitly.


2) The above type promotion constructors treat the input encoding as that of the current
locale, rather than UTF8 assuredly.


The type promotion constructors are important if you want to allow the compiler to promote
an 8 bit string to a wxString for you without special syntax.


3) If you decide to keep 8 bit strings in memory, encoded in the current locale, then
someday when you load a chinese board file, you will not be able to hold those strings in
a deficient 8 bit encoding.  (UTF8 is not a deficient 8 bit encoding, some others are.)
The software breaks at that point.  This argues for using UTF8 always as the internal 8
bit encoding.  But now the above two constructors are broken, since the current locale's
encoding cannot be assumed to be UTF8, even though it often is on linux.  You just cannot
assume it.

In summary, I don't see any easy immediate relief from the boat anchor we know as
wxString, even with wx 3.0.  But I will continue to think about it.

Dick


Attached is a patch needing a good look, that shows off a new class UTF8 that I wrote that
solves the problems addressed above by providing conversion operators to and from
wxString, yet holding UTF8 data in what is basically a std::string.


Please say how it impacts you, realizing its usage scope can be trimmed or expanded from
this sampling.

I am especially interested in:

a) how it compiles on gcc >= 4.8
b) how it compiles using clang.
c) what it does to any benchmarks of sane-ness and speed for stroke_font.h

Lorenzo, Marco, Orson, your feedback in particular is wanted.

class UTF8 will likely allow the removal of many many more calls to TO_UTF8() and
FROM_UTF8(), not in this patch.

Plus code size will likely be reduced because I put the size expensive stuff out of line
in a lean call interface.

Dick

Everything compiles & works fine with gcc 4.8.1 and wx 3.0. As there is not much code contributed by us that works with strings - I do not see anything that I could be missing. After some simple performance tests, I confirmed my expectations (and Lorenzo's as well) that it does not affect rendering speed noticeably. I was wondering about some modifications of the uni_iter class to make it usable with functions available in <algorithms> in the standard library (e.g. https://gist.github.com/jeetsukumaran/307264/). It should not require a lot of changes, if you want - I can try it out. One trap that I can see is having both iterator (from std::string) and uni_iter. It may lead to situations when one uses std::string::iterator (just by habit or was not aware how does it work) and what really meant is uni_iter. In my opinion if the class is specifically designed for UTF8, we could drop the std::string iterator.
Besides that - everything is fine with me.

Regards,
Orson


Follow ups

References