← Back to team overview

kicad-developers team mailing list archive

Re: 6.0 string proposal

 

John I got this too from reading the class documentation an hour ago.

To smoke these out, a person could comment out the undesirable calls in a wx header,
perhaps one that was temporarily moved into a place at a higher priority in the INCLUDE
file search space.

Then "make -i"

perhaps on a subset of multi-threaded source files (object file make targets), or the
whole shebang for maximum pain.



On 5/3/19 10:06 AM, John Beard wrote:
> Hi Jeff,
> 
> I think it is the index access operator that performs this caching, to allow you to access
> the n'th code point any number of times while only iterating the string only once.
> 
> However, you can still use the iterator access safely. It is only index based access that
> is cached and thread-unsafe.
> 
> This is what the wxString documention recommends. Furthermore, in any Unicode string,
> regardless of encoding (8, 16, 32), index access is almost entirely useless anyway, as
> code units/points are only indirectly related to glyphs and/or perceived characters
> anyway. If you need to parse a Unicode string, you must iterate from the start. There is
> no way around it.
> 
> If we're crashing due to cross thread access by index, the bug is probably that we access
> the string by index at all. If this was accessed by iterator, cross thread, and the string
> is not changed, it's fine. If the string is changed in another thread, cached iterators
> are invalid (same as if you change an C++ container in a single thread. The standard tells
> you what iterators are invalidated for each operation on a container).
> 
> I may have got the wrong end of the wxStick here (I can't check it for myself right now),
> but as far as I can tell, this is fixable by just never caching indices, as if we were
> looking at a C-style char array, and using iterators instead.
> 
> We should probably also turn off the unsafe string conversions by defining
> wxNO_UNSAFE_WXSTRING_CONV, if it is not already define.
> 
> Cheers,
> 
> John
> 
> On 3 May 2019 16:35:30 CEST, Jeff Young <jeff@xxxxxxxxx> wrote:
> 
>     Yes, we know exactly why it crashes: in order to speed up iterator access each
>     iterator keeps a pointer into the last location accessed (so that i-1 and i+1 can be
>     fast).  These pointers are kept in a linked-list.  Adding and removing pointers from
>     this list is not thread-protected.
> 
>     Note that wxWidgets will add/remove a pointer even for something seemingly innocuous
>     like an Empty() check.  So doing mutex locks on our side for non-const iterator access
>     is not sufficient.
> 
>     The worst part is that since two threads collide on the same string only rarely, we
>     don’t even know how many of these bugs we have.  We’ve fixed 3 or 4 of them (by adding
>     our own mutex checking on any access), but are there 0 or 10 more?  Haven’t a clue.
> 
>>>     It is between sad and breath taking.
> 
>     Indeed.
> 
>     Cheers,
>     Jeff.
> 
>>     On 3 May 2019, at 15:16, Dick Hollenbeck <dick@xxxxxxxxxxx
>>     <mailto:dick@xxxxxxxxxxx>> wrote:
>>
>>     Thanks Jeff.
>>
>>     On 5/3/19 4:22 AM, Jeff Young wrote:
>>>     Hi Dick,
>>>
>>>>>     h) What is the list of deficiencies with current string usage?
>>>
>>>     I only have one issue with the current use of wxString, but it’s a big one: it crashes
>>>     (unpredictably) when used multi-threaded in UTF8 mode.
>>
>>     The fact that it is onely *One* issue is an important data point.
>>
>>     Since you know it is crashing in this class, you must know approximately where, and
>>     under
>>     what kind of read/write activity.  Of course, if read activity triggers a lazy
>>     (deferred)
>>     transformation, then this distinction can get blurred.  But more information on source
>>     file locations would be very helpful to me.
>>
>>     Another important data point you brought is that the wx library designers are advising
>>     against using wxString for core application.  It will take a couple of hours to even
>>     contemplate that, it is basically staggering to me.  It is between sad and breath
>>     taking.
>>     Sounds like they designed themselves into a corner and are now acknowledging that what
>>     they designed is more of an API commitment that they want to disavow than a real
>>     solution.
>>
>>     I can see where that can happen.  Superior designs come from experience.  Experience
>>     comes
>>     with usage and time, neither of which are always available up front.
>>
>>
>>
>>
>>
>>>
>>>     This design document makes for fascinating
>>>     reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support.  It appears that the
>>>     current wxString is at least in part modelled on QtString.
>>>
>>>     There’s also a bunch of interesting info
>>>     here: https://docs.wxwidgets.org/trunk/overview_string.html, which I believe is more
>>>     up-to-date than the previous link.  In particular, there’s the mention that wxString
>>>     handles extra-BMP characters transparently when compiled in UTF8 mode (currently
>>>     used by
>>>     Kicad), but does NOT when compiled in default mode (in which case the app must handle
>>>     surrogate pairs).  This of course directly leads to your point (d):
>>>
>>>>>>     d) What does the set of characters that don't fall into UCS2 actually look like?
>>>>>>      How big
>>>>>>     is this set, really?  (UTF16 is bigger than UCS2 and picks up the difference.)
>>>
>>>     Do we really need to handle extra-BMP characters?
>>>
>>>     An even more recent version of the second document
>>>     (https://docs.wxwidgets.org/trunk/classwx_string.html) finally makes an oblique
>>>     reference
>>>     to the multi-threading issue by starting with this (rather unhelpful) suggestion:
>>>
>>>     Note
>>>        While the use of wxString <https://docs.wxwidgets.org/trunk/classwx_string.html> is
>>>        unavoidable in wxWidgets program, you are encouraged to use the standard string
>>>        classes |std::string| or |std::wstring| in your applications and convert them to and
>>>        from wxString <https://docs.wxwidgets.org/trunk/classwx_string.html> only when
>>>        interacting with wxWidgets.
>>>
>>>
>>>     Cheers,
>>>     Jeff.
>>>
>>>
>>>>     On 3 May 2019, at 02:03, Dick Hollenbeck <dick@xxxxxxxxxxx
>>>>     <mailto:dick@xxxxxxxxxxx> <mailto:dick@xxxxxxxxxxx>> wrote:
>>>>
>>>>     On 5/2/19 5:32 PM, Dick Hollenbeck wrote:
>>>>>     On 4/30/19 4:36 AM, Jeff Young wrote:
>>>>>>     We had talked earlier about throwing the wxWidgets UTF8 compile switch to get rid of
>>>>>>     our wxString re-entrancy problems.  However, I noticed that the 6.0 work
>>>>>>     packages doc
>>>>>>     includes an item for std::string-ization of the BOARD.  (While a lot more work, this
>>>>>>     is a better solution because it also increases our gui-toolkit-choice flexibility.)
>>>>>>
>>>>>>     I’d like to propose that we use std::wstring for that.  UTF8 should *only* be an
>>>>>>     encoding format (similar to s-expr).  It should never be used internally. That’s
>>>>>>     what
>>>>>>     unicode wchar_t’s are for.
>>>>>>
>>>>>>     And I’d like to propose that we extend std::wstring-ization to SCH_ITEM and
>>>>>>     LIB_ITEM.
>>>>>>      (Then we can get rid of a bunch of our ugly mutex hacks.)
>>>>>
>>>>>
>>>>>     I've been looking at this for a few months now.  I think it is so important, that a
>>>>>     sub-committee should be formed, and if that committee takes as long as 4 months
>>>>>     to come to
>>>>>     a recommendation, this would not be too long.  This issue is simply too critical.
>>>>>
>>>>>     I would like to volunteer to be on that committee.  For the entire list to
>>>>>     participate in
>>>>>     this simply does not make sense to me.  I would welcome the opportunity to study
>>>>>     this with
>>>>>     a team of 5-6 players.  More than that probably leads to anxiety.  Then, given the
>>>>>     recommendations, the list would of course have an opportunity to raise questions
>>>>>     and take
>>>>>     shots, before a strategy is formulated, and before anything is implemented.
>>>>>
>>>>>     Again, approximately:
>>>>>
>>>>>      committee recommendations -> list approval -> strategy formulation -> implementation
>>>>>
>>>>>
>>>>>     Up to now I have looked at many libraries and have [way *too* much] experience in
>>>>>     multiple
>>>>>     languages on multiple platforms, so I think I can be valuable contributor.
>>>>>
>>>>>     The final work product initially would simply be a list of recommendations, that
>>>>>     quickly
>>>>>     transforms to a strategy thereafter.  This is an enormous undertaking, so I suggest
>>>>>     against racing to a solution.  It could look a lot easier than it will ultimately
>>>>>     be, as
>>>>>     is typical in software development.  But the return on investment needs to be
>>>>>     near optimal
>>>>>     in the end.
>>>>>
>>>>>     Some questions to answer are:
>>>>>
>>>>>     a) How did wxString get to its current state?  Is is merely a conglomeration of after
>>>>>     thought, or is is anywhere near optimal.
>>>>>
>>>>>     b) Why so many forms of it?  Can one form be chosen for all platforms?
>>>>>
>>>>>     c) How does wxString it compare to QtString?
>>>>>
>>>>>     d) What does the set of characters that don't fall into UCS2 actually look like?
>>>>>      How big
>>>>>     is this set, really?  (UTF16 is bigger than UCS2 and picks up the difference.)
>>>>>
>>>>>     e) For data files, I think UTF8 is fine.  So the change is for RAM manipulation of
>>>>>     strings.  Aren't we talking about a RAM resident string that bridges into the GUI
>>>>>     seamlessly?
>>>>>
>>>>>     f) What does new C++ language support offer?
>>>>>
>>>>>     g) What do C++ language designers suggest?
>>>>
>>>>     h) What is the list of deficiencies with current string usage?
>>>>
>>>>
>>>>>
>>>>>
>>>>>     etc.
>>>>>
>>>>>     But this is best continued in a smaller group, as said.
>>>>>
>>>>>
>>>>>     The other thing that I bring to this is vast familiarity with KiCad's internal
>>>>>     workings,
>>>>>     string use cases, and goals.
>>>>>
>>>>>     Let me know if I can help.
>>>>>
>>>>>     Regards,
>>>>>
>>>>>     Dick
>>>>>
>>>>>
>>>>>     _______________________________________________
>>>>>     Mailing list: https://launchpad.net/~kicad-developers
>>>>>     Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
>>>>>     <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>>>     <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>>>     Unsubscribe : https://launchpad.net/~kicad-developers
>>>>>     More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>
>>>>
>>>>     _______________________________________________
>>>>     Mailing list: https://launchpad.net/~kicad-developers
>>>>     Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
>>>>     <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>>     <mailto:kicad-developers@xxxxxxxxxxxxxxxxxxx>
>>>>     Unsubscribe : https://launchpad.net/~kicad-developers
>>>>     More help   : https://help.launchpad.net/ListHelp
> 



References