kicad-developers team mailing list archive

Thread
Date

Re: The problems with wxString

To: Maciej Sumiński <maciej.suminski@xxxxxxx>
From: Dick Hollenbeck <dick@xxxxxxxxxxx>
Date: Thu, 02 Jan 2014 11:04:16 -0600
Cc: kicad-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <52C59708.8060009@cern.ch>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0

On 01/02/2014 10:42 AM, Maciej Sumiński wrote:
> On 01/01/2014 07:58 AM, Dick Hollenbeck wrote:
>> On 11/21/2013 02:16 PM, Dick Hollenbeck wrote:
>>>
>>> 1) wx >= 2.9 has these constructors
>>>
>>>
>>> 	wxString( const char* )
>>>   	wxString( std::string )
>>>
>>> whereas wx 2.8 does not.
>>>
>>> Both offer:
>>>
>>> wxString( const char*, wxConvUTF8 );
>>>
>>> but this cannot be used in a default "type promotion" situation, this constructor must be
>>> invoked explicitly.
>>>
>>>
>>> 2) The above type promotion constructors treat the input encoding as that of the current
>>> locale, rather than UTF8 assuredly.
>>>
>>>
>>> The type promotion constructors are important if you want to allow the compiler to promote
>>> an 8 bit string to a wxString for you without special syntax.
>>>
>>>
>>> 3) If you decide to keep 8 bit strings in memory, encoded in the current locale, then
>>> someday when you load a chinese board file, you will not be able to hold those strings in
>>> a deficient 8 bit encoding.  (UTF8 is not a deficient 8 bit encoding, some others are.)
>>> The software breaks at that point.  This argues for using UTF8 always as the internal 8
>>> bit encoding.  But now the above two constructors are broken, since the current locale's
>>> encoding cannot be assumed to be UTF8, even though it often is on linux.  You just cannot
>>> assume it.
>>>
>>> In summary, I don't see any easy immediate relief from the boat anchor we know as
>>> wxString, even with wx 3.0.  But I will continue to think about it.
>>>
>>> Dick
>>
>>
>> Attached is a patch needing a good look, that shows off a new class UTF8 that I wrote that
>> solves the problems addressed above by providing conversion operators to and from
>> wxString, yet holding UTF8 data in what is basically a std::string.
>>
>>
>> Please say how it impacts you, realizing its usage scope can be trimmed or expanded from
>> this sampling.
>>
>> I am especially interested in:
>>
>> a) how it compiles on gcc >= 4.8
>> b) how it compiles using clang.
>> c) what it does to any benchmarks of sane-ness and speed for stroke_font.h
>>
>> Lorenzo, Marco, Orson, your feedback in particular is wanted.
>>
>> class UTF8 will likely allow the removal of many many more calls to TO_UTF8() and
>> FROM_UTF8(), not in this patch.
>>
>> Plus code size will likely be reduced because I put the size expensive stuff out of line
>> in a lean call interface.
>>
>> Dick
> 
> Everything compiles & works fine with gcc 4.8.1 and wx 3.0. As there is 
> not much code contributed by us that works with strings - I do not see 
> anything that I could be missing. After some simple performance tests, I 
> confirmed my expectations (and Lorenzo's as well) that it does not 
> affect rendering speed noticeably.
> I was wondering about some modifications of the uni_iter class to make 
> it usable with functions available in <algorithms> in the standard 
> library (e.g. https://gist.github.com/jeetsukumaran/307264/). It should 
> not require a lot of changes, if you want - I can try it out.
> One trap that I can see is having both iterator (from std::string) and 
> uni_iter. It may lead to situations when one uses std::string::iterator 
> (just by habit or was not aware how does it work) and what really meant 
> is uni_iter. 


We can simply have a talk with that guy.  I like it the way it is.  uni_iter is not needed
except in one situation and its already in place there.  The rest of the time using
st::string::iterator and std::string::const_iterator is OK, given a good understanding of
UTF8.

I have used both in the patch.

UTF8 can simply be treated as a string of bytes.  The fact that sometimes multiple bytes
are needed to constitute a single character rarely comes up.  This was the WHOLE POINT of
the class.

We will probably never use uni_iter again, outside GAL Stroke.

So I disagree with this, I very much want to keep it a std::string with a few extra bells,
not make it less than a std::string in *any* way.



In my opinion if the class is specifically designed for
> UTF8, we could drop the std::string iterator.

Definitely not.  There are few expected side effects to treating it like a std::string.
If you find one, I'd rather you work in wxStrings.


> Besides that - everything is fine with me.


> 
> Regards,
> Orson
>

Follow ups

Re: The problems with wxString
From: Dick Hollenbeck, 2014-01-02

References

some unexpected errors while testing the CvPcb program
From: Edwin van den Oetelaar, 2013-11-19
Re: some unexpected errors while testing the CvPcb program
From: Wayne Stambaugh, 2013-11-20
Re: some unexpected errors while testing the CvPcb program
From: Dick Hollenbeck, 2013-11-20
Re: some unexpected errors while testing the CvPcb program
From: Wayne Stambaugh, 2013-11-21
Re: some unexpected errors while testing the CvPcb program
From: Dick Hollenbeck, 2013-11-21
Re: some unexpected errors while testing the CvPcb program
From: Wayne Stambaugh, 2013-11-21
Re: some unexpected errors while testing the CvPcb program
From: Dick Hollenbeck, 2013-11-21
Re: some unexpected errors while testing the CvPcb program
From: Dick Hollenbeck, 2013-11-21
The problems with wxString
From: Dick Hollenbeck, 2013-11-21
Re: The problems with wxString
From: Dick Hollenbeck, 2014-01-01
Re: The problems with wxString
From: Maciej Sumiński, 2014-01-02