← Back to team overview

kicad-developers team mailing list archive

Re: The problems with wxString

 

On Thu, Jan 02, 2014 at 12:04:36PM -0600, Dick Hollenbeck wrote:
> It is, but it is for strings, not UTF8 character manipulation.  Its best use is at the
> edge of the system in helping with serialization and de-serialization to disk, clipboard,
> other byte oriented peripherals TBD.  Futurer brain interfaces, etc.  whatever needs 8 bit
> data.

More or less what I'm thinking. Using UTF8 in core is somewhat painful.
However luckily 99% of the string usage in kicad AFAIK is "take this
whole string here and put it there", not character surgery...

*However* that care that even the innocuos std::string.size() is plain
wrong with UTF8 encoded characters (remember the old bom which was
always unaligned?). operator[] or at() are simply nonsense. It not its
fault, it's just not designed for doing that. The spec says "Strings are
objects that represent sequences of characters". UTF-8 is not a sequence
of characters, is a sequence of octets. Who said that char is 8 bit
anyway? DSP users learn that quickly (in the hard way!).
std::vector<uint8_t> would be a better container for UTF-8 encoded
strings. in fact an UTF-8 string couldn't be specified using the
std::basic_string template (it requires a random access iterator, while
we only have a forward iterator... maybe a bidirectional one since it's
not too inefficient). The official plan in C++ would be of course to use
wstring and let the locale machinery take care of the encoding... oops,
too bad, locale doesn't handle encoding *and* iostream leave it to the
OS :P

Maybe C++11 has something since it has at least UTF-8 encoded literals?

That said, "industry standards" put that concept in the WC and
then pull the chain:D we have to keep UTF8 around in non-wide strings
and then suffer for it. If not for iostream (or stdio), it's wx which
works that way. And gtk too, anyway.

There is no escape:D just use std::string (or wxString if you need) and
*don't* do anything that wouldn't make sense; given the closed type
system of C++ (and the Liskov principle) I don't think you could define
a class which IS-A std::string *but* doesn't allow these things. You
*could* define something which emulates them (at great performance
cost), but the prospect is painful... for ideas look here
http://userguide.icu-project.org/strings/characteriterator. No, I'm not
suggesting to use ICU, it's way too heavy :D anyway, guess what? they
defined their own (incompatible) string class. Using UTF16, by the way.
Another nightmare like wxString...

-- 
Lorenzo Marcantonio
Logos Srl


References