kicad-developers team mailing list archive
-
kicad-developers team
-
Mailing list archive
-
Message #40459
Re: 6.0 string proposal
John I got this too from reading the class documentation an hour ago.
To smoke these out, a person could comment out the undesirable calls in a wx header,
perhaps one that was temporarily moved into a place at a higher priority in the INCLUDE
file search space.
Then "make -i"
perhaps on a subset of multi-threaded source files (object file make targets), or the
whole shebang for maximum pain.
On 5/3/19 10:06 AM, John Beard wrote:
> Hi Jeff,
>
> I think it is the index access operator that performs this caching, to allow you to access
> the n'th code point any number of times while only iterating the string only once.
>
> However, you can still use the iterator access safely. It is only index based access that
> is cached and thread-unsafe.
>
> This is what the wxString documention recommends. Furthermore, in any Unicode string,
> regardless of encoding (8, 16, 32), index access is almost entirely useless anyway, as
> code units/points are only indirectly related to glyphs and/or perceived characters
> anyway. If you need to parse a Unicode string, you must iterate from the start. There is
> no way around it.
>
> If we're crashing due to cross thread access by index, the bug is probably that we access
> the string by index at all. If this was accessed by iterator, cross thread, and the string
> is not changed, it's fine. If the string is changed in another thread, cached iterators
> are invalid (same as if you change an C++ container in a single thread. The standard tells
> you what iterators are invalidated for each operation on a container).
>
> I may have got the wrong end of the wxStick here (I can't check it for myself right now),
> but as far as I can tell, this is fixable by just never caching indices, as if we were
> looking at a C-style char array, and using iterators instead.
>
> We should probably also turn off the unsafe string conversions by defining
> wxNO_UNSAFE_WXSTRING_CONV, if it is not already define.
>
> Cheers,
>
> John
>
> On 3 May 2019 16:35:30 CEST, Jeff Young <jeff@xxxxxxxxx> wrote:
>
> Yes, we know exactly why it crashes: in order to speed up iterator access each
> iterator keeps a pointer into the last location accessed (so that i-1 and i+1 can be
> fast). These pointers are kept in a linked-list. Adding and removing pointers from
> this list is not thread-protected.
>
> Note that wxWidgets will add/remove a pointer even for something seemingly innocuous
> like an Empty() check. So doing mutex locks on our side for non-const iterator access
> is not sufficient.
>
> The worst part is that since two threads collide on the same string only rarely, we
> don’t even know how many of these bugs we have. We’ve fixed 3 or 4 of them (by adding
> our own mutex checking on any access), but are there 0 or 10 more? Haven’t a clue.
>
>>> It is between sad and breath taking.
>
> Indeed.
>
> Cheers,
> Jeff.
>
>> On 3 May 2019, at 15:16, Dick Hollenbeck <dick@xxxxxxxxxxx
>> <mailto:dick@xxxxxxxxxxx>> wrote:
>>
>> Thanks Jeff.
>>
>> On 5/3/19 4:22 AM, Jeff Young wrote:
>>> Hi Dick,
>>>
>>>>> h) What is the list of deficiencies with current string usage?
>>>
>>> I only have one issue with the current use of wxString, but it’s a big one: it crashes
>>> (unpredictably) when used multi-threaded in UTF8 mode.
>>
>> The fact that it is onely *One* issue is an important data point.
>>
>> Since you know it is crashing in this class, you must know approximately where, and
>> under
>> what kind of read/write activity. Of course, if read activity triggers a lazy
>> (deferred)
>> transformation, then this distinction can get blurred. But more information on source
>> file locations would be very helpful to me.
>>
>> Another important data point you brought is that the wx library designers are advising
>> against using wxString for core application. It will take a couple of hours to even
>> contemplate that, it is basically staggering to me. It is between sad and breath
>> taking.
>> Sounds like they designed themselves into a corner and are now acknowledging that what
>> they designed is more of an API commitment that they want to disavow than a real
>> solution.
>>
>> I can see where that can happen. Superior designs come from experience. Experience
>> comes
>> with usage and time, neither of which are always available up front.
>>
>>
>>
>>
>>
>>>
>>> This design document makes for fascinating
>>> reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support. It appears that the
>>> current wxString is at least in part modelled on QtString.
>>>
>>> There’s also a bunch of interesting info
>>> here: https://docs.wxwidgets.org/trunk/overview_string.html, which I believe is more
>>> up-to-date than the previous link. In particular, there’s the mention that wxString
>>> handles extra-BMP characters transparently when compiled in UTF8 mode (currently
>>> used by
>>> Kicad), but does NOT when compiled in default mode (in which case the app must handle
>>> surrogate pairs). This of course directly leads to your point (d):
>>>
>>>>>> d) What does the set of characters that don't fall into UCS2 actually look like?
>>>>>> How big
>>>>>> is this set, really? (UTF16 is bigger than UCS2 and picks up the difference.)
>>>
>>> Do we really need to handle extra-BMP characters?
>>>
>>> An even more recent version of the second document
>>> (https://docs.wxwidgets.org/trunk/classwx_string.html) finally makes an oblique
>>> reference
>>> to the multi-threading issue by starting with this (rather unhelpful) suggestion:
>>>
>>> Note
>>> While the use of wxString <https://docs.wxwidgets.org/trunk/classwx_string.html> is
>>> unavoidable in wxWidgets program, you are encouraged to use the standard string
>>> classes |std::string| or |std::wstring| in your applications and convert them to and
>>> from wxString <https://docs.wxwidgets.org/trunk/classwx_string.html> only when
>>> interacting with wxWidgets.
>>>
>>>
>>> Cheers,
>>> Jeff.
>>>
>>>
>>>> On 3 May 2019, at 02:03, Dick Hollenbeck <dick@xxxxxxxxxxx
>>>> <mailto:dick@xxxxxxxxxxx> <mailto:dick@xxxxxxxxxxx>> wrote:
>>>>
>>>> On 5/2/19 5:32 PM, Dick Hollenbeck wrote:
>>>>> On 4/30/19 4:36 AM, Jeff Young wrote:
>>>>>> We had talked earlier about throwing the wxWidgets UTF8 compile switch to get rid of
>>>>>> our wxString re-entrancy problems. However, I noticed that the 6.0 work
>>>>>> packages doc
>>>>>> includes an item for std::string-ization of the BOARD. (While a lot more work, this
>>>>>> is a better solution because it also increases our gui-toolkit-choice flexibility.)
>>>>>>
>>>>>> I’d like to propose that we use std::wstring for that. UTF8 should *only* be an
>>>>>> encoding format (similar to s-expr). It should never be used internally. That’s
>>>>>> what
>>>>>> unicode wchar_t’s are for.
>>>>>>
>>>>>> And I’d like to propose that we extend std::wstring-ization to SCH_ITEM and
>>>>>> LIB_ITEM.
>>>>>> (Then we can get rid of a bunch of our ugly mutex hacks.)
>>>>>
>>>>>
>>>>> I've been looking at this for a few months now. I think it is so important, that a
>>>>> sub-committee should be formed, and if that committee takes as long as 4 months
>>>>> to come to
>>>>> a recommendation, this would not be too long. This issue is simply too critical.
>>>>>
>>>>> I would like to volunteer to be on that committee. For the entire list to
>>>>> participate in
>>>>> this simply does not make sense to me. I would welcome the opportunity to study
>>>>> this with
>>>>> a team of 5-6 players. More than that probably leads to anxiety. Then, given the
>>>>> recommendations, the list would of course have an opportunity to raise questions
>>>>> and take
>>>>> shots, before a strategy is formulated, and before anything is implemented.
>>>>>
>>>>> Again, approximately:
>>>>>
>>>>> committee recommendations -> list approval -> strategy formulation -> implementation
>>>>>
>>>>>
>>>>> Up to now I have looked at many libraries and have [way *too* much] experience in
>>>>> multiple
>>>>> languages on multiple platforms, so I think I can be valuable contributor.
>>>>>
>>>>> The final work product initially would simply be a list of recommendations, that
>>>>> quickly
>>>>> transforms to a strategy thereafter. This is an enormous undertaking, so I suggest
>>>>> against racing to a solution. It could look a lot easier than it will ultimately
>>>>> be, as
>>>>> is typical in software development. But the return on investment needs to be
>>>>> near optimal
>>>>> in the end.
>>>>>
>>>>> Some questions to answer are:
>>>>>
>>>>> a) How did wxString get to its current state? Is is merely a conglomeration of after
>>>>> thought, or is is anywhere near optimal.
>>>>>
>>>>> b) Why so many forms of it? Can one form be chosen for all platforms?
>>>>>
>>>>> c) How does wxString it compare to QtString?
>>>>>
>>>>> d) What does the set of characters that don't fall into UCS2 actually look like?
>>>>> How big
>>>>> is this set, really? (UTF16 is bigger than UCS2 and picks up the difference.)
>>>>>
>>>>> e) For data files, I think UTF8 is fine. So the change is for RAM manipulation of
>>>>> strings. Aren't we talking about a RAM resident string that bridges into the GUI
>>>>> seamlessly?
>>>>>
>>>>> f) What does new C++ language support offer?
>>>>>
>>>>> g) What do C++ language designers suggest?
>>>>
>>>> h) What is the list of deficiencies with current string usage?
>>>>
>>>>
>>>>>
>>>>>
>>>>> etc.
>>>>>
>>>>> But this is best continued in a smaller group, as said.
>>>>>
>>>>>
>>>>> The other thing that I bring to this is vast familiarity with KiCad's internal
>>>>> workings,
>>>>> string use cases, and goals.
>>>>>
>>>>> Let me know if I can help.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Dick
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~kicad-developers
>>>>> Post to : kicad-developers@xxxxxxxxxxxxxxxxxxx
>>>>> <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>>> <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>>> Unsubscribe : https://launchpad.net/~kicad-developers
>>>>> More help : https://help.launchpad.net/ListHelp
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~kicad-developers
>>>> Post to : kicad-developers@xxxxxxxxxxxxxxxxxxx
>>>> <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>> <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>> Unsubscribe : https://launchpad.net/~kicad-developers
>>>> More help : https://help.launchpad.net/ListHelp
>
References