kicad-developers team mailing list archive

Thread
Date
Re: 6.0 string proposal

To: Jeff Young <jeff@xxxxxxxxx>, John Beard <john.j.beard@xxxxxxxxx>
From: Dick Hollenbeck <dick@xxxxxxxxxxx>
Date: Fri, 3 May 2019 10:24:07 -0500
Cc: kicad-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <67ACC0D2-B92F-4215-8996-94D33FD74DAB@rokeby.ie>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
Could that damage have already been done by prior concurrent access?  Maybe a subsequent
read only operation fails because your linked list is already hay wired...

Walking a linked list at a later point in time than when it was damaged can do this..



On 5/3/19 10:18 AM, Jeff Young wrote:
> I don’t believe that’s the case.  Neither of the two crashes that I tracked down involved
> direct index access (or any change to the string).  One was calling foo.IsEmpty(), and the
> other foo.Length().  Both use const iterators under the hood.
> 
> When wxUSE_UNICODE_UTF8 is off, wxWidgets gets around parsing the string by just not doing it:
> 
> Remarks
>     Note that while the behaviour of wxString
>     <https://docs.wxwidgets.org/trunk/classwx_string.html> when |wxUSE_UNICODE_WCHAR==1| resembles
>     UCS-2 encoding, it's not completely correct to refer to wxString
>     <https://docs.wxwidgets.org/trunk/classwx_string.html> as UCS-2 encoded since you can
>     encode code points outside the /BMP/ in a wxString
>     <https://docs.wxwidgets.org/trunk/classwx_string.html> as two code units (i.e. as a
>     surrogate pair; as already mentioned however wxString
>     <https://docs.wxwidgets.org/trunk/classwx_string.html> will "see" them as two
>     different code points)
> 
> 
>> On 3 May 2019, at 16:06, John Beard <john.j.beard@xxxxxxxxx
>> <mailto:john.j.beard@xxxxxxxxx>> wrote:
>>
>> Hi Jeff,
>>
>> I think it is the index access operator that performs this caching, to allow you to
>> access the n'th code point any number of times while only iterating the string only once.
>>
>> However, you can still use the iterator access safely. It is only index based access
>> that is cached and thread-unsafe.
>>
>> This is what the wxString documention recommends. Furthermore, in any Unicode string,
>> regardless of encoding (8, 16, 32), index access is almost entirely useless anyway, as
>> code units/points are only indirectly related to glyphs and/or perceived characters
>> anyway. If you need to parse a Unicode string, you must iterate from the start. There is
>> no way around it.
>>
>> If we're crashing due to cross thread access by index, the bug is probably that we
>> access the string by index at all. If this was accessed by iterator, cross thread, and
>> the string is not changed, it's fine. If the string is changed in another thread, cached
>> iterators are invalid (same as if you change an C++ container in a single thread. The
>> standard tells you what iterators are invalidated for each operation on a container).
>>
>> I may have got the wrong end of the wxStick here (I can't check it for myself right
>> now), but as far as I can tell, this is fixable by just never caching indices, as if we
>> were looking at a C-style char array, and using iterators instead.
>>
>> We should probably also turn off the unsafe string conversions by defining
>> wxNO_UNSAFE_WXSTRING_CONV, if it is not already define.
>>
>> Cheers,
>>
>> John
>>
>> On 3 May 2019 16:35:30 CEST, Jeff Young <jeff@xxxxxxxxx <mailto:jeff@xxxxxxxxx>> wrote:
>>
>>     Yes, we know exactly why it crashes: in order to speed up iterator access each
>>     iterator keeps a pointer into the last location accessed (so that i-1 and i+1 can be
>>     fast).  These pointers are kept in a linked-list.  Adding and removing pointers from
>>     this list is not thread-protected.
>>
>>     Note that wxWidgets will add/remove a pointer even for something seemingly innocuous
>>     like an Empty() check.  So doing mutex locks on our side for non-const iterator
>>     access is not sufficient.
>>
>>     The worst part is that since two threads collide on the same string only rarely, we
>>     don’t even know how many of these bugs we have.  We’ve fixed 3 or 4 of them (by
>>     adding our own mutex checking on any access), but are there 0 or 10 more?  Haven’t a
>>     clue.
>>
>>>>     It is between sad and breath taking.
>>
>>     Indeed.
>>
>>     Cheers,
>>     Jeff.
>>
>>>     On 3 May 2019, at 15:16, Dick Hollenbeck <dick@xxxxxxxxxxx
>>>     <mailto:dick@xxxxxxxxxxx>> wrote:
>>>
>>>     Thanks Jeff.
>>>
>>>     On 5/3/19 4:22 AM, Jeff Young wrote:
>>>>     Hi Dick,
>>>>
>>>>>>     h) What is the list of deficiencies with current string usage?
>>>>
>>>>     I only have one issue with the current use of wxString, but it’s a big one: it crashes
>>>>     (unpredictably) when used multi-threaded in UTF8 mode.
>>>
>>>     The fact that it is onely *One* issue is an important data point.
>>>
>>>     Since you know it is crashing in this class, you must know approximately where, and
>>>     under
>>>     what kind of read/write activity.  Of course, if read activity triggers a lazy
>>>     (deferred)
>>>     transformation, then this distinction can get blurred.  But more information on source
>>>     file locations would be very helpful to me.
>>>
>>>     Another important data point you brought is that the wx library designers are advising
>>>     against using wxString for core application.  It will take a couple of hours to even
>>>     contemplate that, it is basically staggering to me.  It is between sad and breath
>>>     taking.
>>>     Sounds like they designed themselves into a corner and are now acknowledging that what
>>>     they designed is more of an API commitment that they want to disavow than a real
>>>     solution.
>>>
>>>     I can see where that can happen.  Superior designs come from experience.
>>>      Experience comes
>>>     with usage and time, neither of which are always available up front.
>>>
>>>
>>>
>>>
>>>
>>>>
>>>>     This design document makes for fascinating
>>>>     reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support.  It appears that the
>>>>     current wxString is at least in part modelled on QtString.
>>>>
>>>>     There’s also a bunch of interesting info
>>>>     here: https://docs.wxwidgets.org/trunk/overview_string.html, which I believe is more
>>>>     up-to-date than the previous link.  In particular, there’s the mention that wxString
>>>>     handles extra-BMP characters transparently when compiled in UTF8 mode (currently
>>>>     used by
>>>>     Kicad), but does NOT when compiled in default mode (in which case the app must handle
>>>>     surrogate pairs).  This of course directly leads to your point (d):
>>>>
>>>>>>>     d) What does the set of characters that don't fall into UCS2 actually look
>>>>>>>     like?  How big
>>>>>>>     is this set, really?  (UTF16 is bigger than UCS2 and picks up the difference.)
>>>>
>>>>     Do we really need to handle extra-BMP characters?
>>>>
>>>>     An even more recent version of the second document
>>>>     (https://docs.wxwidgets.org/trunk/classwx_string.html) finally makes an oblique
>>>>     reference
>>>>     to the multi-threading issue by starting with this (rather unhelpful) suggestion:
>>>>
>>>>     Note
>>>>        While the use of wxString <https://docs.wxwidgets.org/trunk/classwx_string.html> is
>>>>        unavoidable in wxWidgets program, you are encouraged to use the standard string
>>>>        classes |std::string| or |std::wstring| in your applications and convert them
>>>>     to and
>>>>        from wxString <https://docs.wxwidgets.org/trunk/classwx_string.html> only when
>>>>        interacting with wxWidgets.
>>>>
>>>>
>>>>     Cheers,
>>>>     Jeff.
>>>>
>>>>
>>>>>     On 3 May 2019, at 02:03, Dick Hollenbeck <dick@xxxxxxxxxxx
>>>>>     <mailto:dick@xxxxxxxxxxx> <mailto:dick@xxxxxxxxxxx>> wrote:
>>>>>
>>>>>     On 5/2/19 5:32 PM, Dick Hollenbeck wrote:
>>>>>>     On 4/30/19 4:36 AM, Jeff Young wrote:
>>>>>>>     We had talked earlier about throwing the wxWidgets UTF8 compile switch to get
>>>>>>>     rid of
>>>>>>>     our wxString re-entrancy problems.  However, I noticed that the 6.0 work
>>>>>>>     packages doc
>>>>>>>     includes an item for std::string-ization of the BOARD.  (While a lot more work,
>>>>>>>     this
>>>>>>>     is a better solution because it also increases our gui-toolkit-choice flexibility.)
>>>>>>>
>>>>>>>     I’d like to propose that we use std::wstring for that.  UTF8 should *only* be an
>>>>>>>     encoding format (similar to s-expr).  It should never be used internally.
>>>>>>>     That’s what
>>>>>>>     unicode wchar_t’s are for.
>>>>>>>
>>>>>>>     And I’d like to propose that we extend std::wstring-ization to SCH_ITEM and
>>>>>>>     LIB_ITEM.
>>>>>>>      (Then we can get rid of a bunch of our ugly mutex hacks.)
>>>>>>
>>>>>>
>>>>>>     I've been looking at this for a few months now.  I think it is so important, that a
>>>>>>     sub-committee should be formed, and if that committee takes as long as 4 months
>>>>>>     to come to
>>>>>>     a recommendation, this would not be too long.  This issue is simply too critical.
>>>>>>
>>>>>>     I would like to volunteer to be on that committee.  For the entire list to
>>>>>>     participate in
>>>>>>     this simply does not make sense to me.  I would welcome the opportunity to study
>>>>>>     this with
>>>>>>     a team of 5-6 players.  More than that probably leads to anxiety.  Then, given the
>>>>>>     recommendations, the list would of course have an opportunity to raise questions
>>>>>>     and take
>>>>>>     shots, before a strategy is formulated, and before anything is implemented.
>>>>>>
>>>>>>     Again, approximately:
>>>>>>
>>>>>>      committee recommendations -> list approval -> strategy formulation ->
>>>>>>     implementation
>>>>>>
>>>>>>
>>>>>>     Up to now I have looked at many libraries and have [way *too* much] experience
>>>>>>     in multiple
>>>>>>     languages on multiple platforms, so I think I can be valuable contributor.
>>>>>>
>>>>>>     The final work product initially would simply be a list of recommendations, that
>>>>>>     quickly
>>>>>>     transforms to a strategy thereafter.  This is an enormous undertaking, so I suggest
>>>>>>     against racing to a solution.  It could look a lot easier than it will
>>>>>>     ultimately be, as
>>>>>>     is typical in software development.  But the return on investment needs to be
>>>>>>     near optimal
>>>>>>     in the end.
>>>>>>
>>>>>>     Some questions to answer are:
>>>>>>
>>>>>>     a) How did wxString get to its current state?  Is is merely a conglomeration of
>>>>>>     after
>>>>>>     thought, or is is anywhere near optimal.
>>>>>>
>>>>>>     b) Why so many forms of it?  Can one form be chosen for all platforms?
>>>>>>
>>>>>>     c) How does wxString it compare to QtString?
>>>>>>
>>>>>>     d) What does the set of characters that don't fall into UCS2 actually look like?
>>>>>>      How big
>>>>>>     is this set, really?  (UTF16 is bigger than UCS2 and picks up the difference.)
>>>>>>
>>>>>>     e) For data files, I think UTF8 is fine.  So the change is for RAM manipulation of
>>>>>>     strings.  Aren't we talking about a RAM resident string that bridges into the GUI
>>>>>>     seamlessly?
>>>>>>
>>>>>>     f) What does new C++ language support offer?
>>>>>>
>>>>>>     g) What do C++ language designers suggest?
>>>>>
>>>>>     h) What is the list of deficiencies with current string usage?
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>     etc.
>>>>>>
>>>>>>     But this is best continued in a smaller group, as said.
>>>>>>
>>>>>>
>>>>>>     The other thing that I bring to this is vast familiarity with KiCad's internal
>>>>>>     workings,
>>>>>>     string use cases, and goals.
>>>>>>
>>>>>>     Let me know if I can help.
>>>>>>
>>>>>>     Regards,
>>>>>>
>>>>>>     Dick
>>>>>>
>>>>>>
>>>>>>     _______________________________________________
>>>>>>     Mailing list: https://launchpad.net/~kicad-developers
>>>>>>     Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
>>>>>>     <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>>>>     <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>>>>     Unsubscribe : https://launchpad.net/~kicad-developers
>>>>>>     More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>
>>>>>
>>>>>     _______________________________________________
>>>>>     Mailing list: https://launchpad.net/~kicad-developers
>>>>>     Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
>>>>>     <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>>>     <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>>>     Unsubscribe : https://launchpad.net/~kicad-developers
>>>>>     More help   : https://help.launchpad.net/ListHelp
>>
>
Follow ups

Re: 6.0 string proposal
From: Jeff Young, 2019-05-03
References

6.0 string proposal
From: Jeff Young, 2019-04-30
Re: 6.0 string proposal
From: Dick Hollenbeck, 2019-05-02
Re: 6.0 string proposal
From: Dick Hollenbeck, 2019-05-03
Re: 6.0 string proposal
From: Jeff Young, 2019-05-03
Re: 6.0 string proposal
From: Dick Hollenbeck, 2019-05-03
Re: 6.0 string proposal
From: Jeff Young, 2019-05-03
Re: 6.0 string proposal
From: John Beard, 2019-05-03
Re: 6.0 string proposal
From: Jeff Young, 2019-05-03