kicad-developers team mailing list archive
-
kicad-developers team
-
Mailing list archive
-
Message #40462
Re: 6.0 string proposal
Could that damage have already been done by prior concurrent access? Maybe a subsequent
read only operation fails because your linked list is already hay wired...
Walking a linked list at a later point in time than when it was damaged can do this..
On 5/3/19 10:18 AM, Jeff Young wrote:
> I don’t believe that’s the case. Neither of the two crashes that I tracked down involved
> direct index access (or any change to the string). One was calling foo.IsEmpty(), and the
> other foo.Length(). Both use const iterators under the hood.
>
> When wxUSE_UNICODE_UTF8 is off, wxWidgets gets around parsing the string by just not doing it:
>
> Remarks
> Note that while the behaviour of wxString
> <https://docs.wxwidgets.org/trunk/classwx_string.html> when |wxUSE_UNICODE_WCHAR==1| resembles
> UCS-2 encoding, it's not completely correct to refer to wxString
> <https://docs.wxwidgets.org/trunk/classwx_string.html> as UCS-2 encoded since you can
> encode code points outside the /BMP/ in a wxString
> <https://docs.wxwidgets.org/trunk/classwx_string.html> as two code units (i.e. as a
> surrogate pair; as already mentioned however wxString
> <https://docs.wxwidgets.org/trunk/classwx_string.html> will "see" them as two
> different code points)
>
>
>> On 3 May 2019, at 16:06, John Beard <john.j.beard@xxxxxxxxx
>> <mailto:john.j.beard@xxxxxxxxx>> wrote:
>>
>> Hi Jeff,
>>
>> I think it is the index access operator that performs this caching, to allow you to
>> access the n'th code point any number of times while only iterating the string only once.
>>
>> However, you can still use the iterator access safely. It is only index based access
>> that is cached and thread-unsafe.
>>
>> This is what the wxString documention recommends. Furthermore, in any Unicode string,
>> regardless of encoding (8, 16, 32), index access is almost entirely useless anyway, as
>> code units/points are only indirectly related to glyphs and/or perceived characters
>> anyway. If you need to parse a Unicode string, you must iterate from the start. There is
>> no way around it.
>>
>> If we're crashing due to cross thread access by index, the bug is probably that we
>> access the string by index at all. If this was accessed by iterator, cross thread, and
>> the string is not changed, it's fine. If the string is changed in another thread, cached
>> iterators are invalid (same as if you change an C++ container in a single thread. The
>> standard tells you what iterators are invalidated for each operation on a container).
>>
>> I may have got the wrong end of the wxStick here (I can't check it for myself right
>> now), but as far as I can tell, this is fixable by just never caching indices, as if we
>> were looking at a C-style char array, and using iterators instead.
>>
>> We should probably also turn off the unsafe string conversions by defining
>> wxNO_UNSAFE_WXSTRING_CONV, if it is not already define.
>>
>> Cheers,
>>
>> John
>>
>> On 3 May 2019 16:35:30 CEST, Jeff Young <jeff@xxxxxxxxx <mailto:jeff@xxxxxxxxx>> wrote:
>>
>> Yes, we know exactly why it crashes: in order to speed up iterator access each
>> iterator keeps a pointer into the last location accessed (so that i-1 and i+1 can be
>> fast). These pointers are kept in a linked-list. Adding and removing pointers from
>> this list is not thread-protected.
>>
>> Note that wxWidgets will add/remove a pointer even for something seemingly innocuous
>> like an Empty() check. So doing mutex locks on our side for non-const iterator
>> access is not sufficient.
>>
>> The worst part is that since two threads collide on the same string only rarely, we
>> don’t even know how many of these bugs we have. We’ve fixed 3 or 4 of them (by
>> adding our own mutex checking on any access), but are there 0 or 10 more? Haven’t a
>> clue.
>>
>>>> It is between sad and breath taking.
>>
>> Indeed.
>>
>> Cheers,
>> Jeff.
>>
>>> On 3 May 2019, at 15:16, Dick Hollenbeck <dick@xxxxxxxxxxx
>>> <mailto:dick@xxxxxxxxxxx>> wrote:
>>>
>>> Thanks Jeff.
>>>
>>> On 5/3/19 4:22 AM, Jeff Young wrote:
>>>> Hi Dick,
>>>>
>>>>>> h) What is the list of deficiencies with current string usage?
>>>>
>>>> I only have one issue with the current use of wxString, but it’s a big one: it crashes
>>>> (unpredictably) when used multi-threaded in UTF8 mode.
>>>
>>> The fact that it is onely *One* issue is an important data point.
>>>
>>> Since you know it is crashing in this class, you must know approximately where, and
>>> under
>>> what kind of read/write activity. Of course, if read activity triggers a lazy
>>> (deferred)
>>> transformation, then this distinction can get blurred. But more information on source
>>> file locations would be very helpful to me.
>>>
>>> Another important data point you brought is that the wx library designers are advising
>>> against using wxString for core application. It will take a couple of hours to even
>>> contemplate that, it is basically staggering to me. It is between sad and breath
>>> taking.
>>> Sounds like they designed themselves into a corner and are now acknowledging that what
>>> they designed is more of an API commitment that they want to disavow than a real
>>> solution.
>>>
>>> I can see where that can happen. Superior designs come from experience.
>>> Experience comes
>>> with usage and time, neither of which are always available up front.
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> This design document makes for fascinating
>>>> reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support. It appears that the
>>>> current wxString is at least in part modelled on QtString.
>>>>
>>>> There’s also a bunch of interesting info
>>>> here: https://docs.wxwidgets.org/trunk/overview_string.html, which I believe is more
>>>> up-to-date than the previous link. In particular, there’s the mention that wxString
>>>> handles extra-BMP characters transparently when compiled in UTF8 mode (currently
>>>> used by
>>>> Kicad), but does NOT when compiled in default mode (in which case the app must handle
>>>> surrogate pairs). This of course directly leads to your point (d):
>>>>
>>>>>>> d) What does the set of characters that don't fall into UCS2 actually look
>>>>>>> like? How big
>>>>>>> is this set, really? (UTF16 is bigger than UCS2 and picks up the difference.)
>>>>
>>>> Do we really need to handle extra-BMP characters?
>>>>
>>>> An even more recent version of the second document
>>>> (https://docs.wxwidgets.org/trunk/classwx_string.html) finally makes an oblique
>>>> reference
>>>> to the multi-threading issue by starting with this (rather unhelpful) suggestion:
>>>>
>>>> Note
>>>> While the use of wxString <https://docs.wxwidgets.org/trunk/classwx_string.html> is
>>>> unavoidable in wxWidgets program, you are encouraged to use the standard string
>>>> classes |std::string| or |std::wstring| in your applications and convert them
>>>> to and
>>>> from wxString <https://docs.wxwidgets.org/trunk/classwx_string.html> only when
>>>> interacting with wxWidgets.
>>>>
>>>>
>>>> Cheers,
>>>> Jeff.
>>>>
>>>>
>>>>> On 3 May 2019, at 02:03, Dick Hollenbeck <dick@xxxxxxxxxxx
>>>>> <mailto:dick@xxxxxxxxxxx> <mailto:dick@xxxxxxxxxxx>> wrote:
>>>>>
>>>>> On 5/2/19 5:32 PM, Dick Hollenbeck wrote:
>>>>>> On 4/30/19 4:36 AM, Jeff Young wrote:
>>>>>>> We had talked earlier about throwing the wxWidgets UTF8 compile switch to get
>>>>>>> rid of
>>>>>>> our wxString re-entrancy problems. However, I noticed that the 6.0 work
>>>>>>> packages doc
>>>>>>> includes an item for std::string-ization of the BOARD. (While a lot more work,
>>>>>>> this
>>>>>>> is a better solution because it also increases our gui-toolkit-choice flexibility.)
>>>>>>>
>>>>>>> I’d like to propose that we use std::wstring for that. UTF8 should *only* be an
>>>>>>> encoding format (similar to s-expr). It should never be used internally.
>>>>>>> That’s what
>>>>>>> unicode wchar_t’s are for.
>>>>>>>
>>>>>>> And I’d like to propose that we extend std::wstring-ization to SCH_ITEM and
>>>>>>> LIB_ITEM.
>>>>>>> (Then we can get rid of a bunch of our ugly mutex hacks.)
>>>>>>
>>>>>>
>>>>>> I've been looking at this for a few months now. I think it is so important, that a
>>>>>> sub-committee should be formed, and if that committee takes as long as 4 months
>>>>>> to come to
>>>>>> a recommendation, this would not be too long. This issue is simply too critical.
>>>>>>
>>>>>> I would like to volunteer to be on that committee. For the entire list to
>>>>>> participate in
>>>>>> this simply does not make sense to me. I would welcome the opportunity to study
>>>>>> this with
>>>>>> a team of 5-6 players. More than that probably leads to anxiety. Then, given the
>>>>>> recommendations, the list would of course have an opportunity to raise questions
>>>>>> and take
>>>>>> shots, before a strategy is formulated, and before anything is implemented.
>>>>>>
>>>>>> Again, approximately:
>>>>>>
>>>>>> committee recommendations -> list approval -> strategy formulation ->
>>>>>> implementation
>>>>>>
>>>>>>
>>>>>> Up to now I have looked at many libraries and have [way *too* much] experience
>>>>>> in multiple
>>>>>> languages on multiple platforms, so I think I can be valuable contributor.
>>>>>>
>>>>>> The final work product initially would simply be a list of recommendations, that
>>>>>> quickly
>>>>>> transforms to a strategy thereafter. This is an enormous undertaking, so I suggest
>>>>>> against racing to a solution. It could look a lot easier than it will
>>>>>> ultimately be, as
>>>>>> is typical in software development. But the return on investment needs to be
>>>>>> near optimal
>>>>>> in the end.
>>>>>>
>>>>>> Some questions to answer are:
>>>>>>
>>>>>> a) How did wxString get to its current state? Is is merely a conglomeration of
>>>>>> after
>>>>>> thought, or is is anywhere near optimal.
>>>>>>
>>>>>> b) Why so many forms of it? Can one form be chosen for all platforms?
>>>>>>
>>>>>> c) How does wxString it compare to QtString?
>>>>>>
>>>>>> d) What does the set of characters that don't fall into UCS2 actually look like?
>>>>>> How big
>>>>>> is this set, really? (UTF16 is bigger than UCS2 and picks up the difference.)
>>>>>>
>>>>>> e) For data files, I think UTF8 is fine. So the change is for RAM manipulation of
>>>>>> strings. Aren't we talking about a RAM resident string that bridges into the GUI
>>>>>> seamlessly?
>>>>>>
>>>>>> f) What does new C++ language support offer?
>>>>>>
>>>>>> g) What do C++ language designers suggest?
>>>>>
>>>>> h) What is the list of deficiencies with current string usage?
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> etc.
>>>>>>
>>>>>> But this is best continued in a smaller group, as said.
>>>>>>
>>>>>>
>>>>>> The other thing that I bring to this is vast familiarity with KiCad's internal
>>>>>> workings,
>>>>>> string use cases, and goals.
>>>>>>
>>>>>> Let me know if I can help.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Dick
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list: https://launchpad.net/~kicad-developers
>>>>>> Post to : kicad-developers@xxxxxxxxxxxxxxxxxxx
>>>>>> <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>>>> <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>>>> Unsubscribe : https://launchpad.net/~kicad-developers
>>>>>> More help : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~kicad-developers
>>>>> Post to : kicad-developers@xxxxxxxxxxxxxxxxxxx
>>>>> <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>>> <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>>> Unsubscribe : https://launchpad.net/~kicad-developers
>>>>> More help : https://help.launchpad.net/ListHelp
>>
>
Follow ups
References