← Back to team overview

kicad-developers team mailing list archive

Re: 6.0 string proposal

 

String access is a factor in the performance of the new real-time
connectivity algorithm in eeschema, since all connectivity is established
by parsing labels and pin names.  I have not done benchmarks comparing
various options for string storage, but we would need to watch that space
too if we change how strings work.

-Jon

On Tue, Apr 30, 2019 at 8:41 PM John Beard <john.j.beard@xxxxxxxxx> wrote:

> On 30/04/2019 16:01, Jeff Young wrote:
> > Primarily for performance reasons.
>
> WRT performance, I did a few benchmarks for reference (on Linux)
>
> Loading this large CIAA PCB[1] allocates, out of a peak usage of 467MB
> of heap with a 0.01% threshold:
>
> * 9.6MB of std::basic_string<wchar_t>::_M_assign
>     * 9.4MB of this is from wxString operator= assignments
> * ~600kB of std::basic_string<wchar_t>::_M_construct, (wxString ctor)
>
> So I'm not sure memory usage is a major factor to worry about (strings
> allocate storage on the heap, so we should see basically all the
> interesting things in the heap profile). UTF-8 could be as little as 1/4
> UTF-32 (all strings are ASCII), but even then, it's a few MB saved.
>
> Now, in terms of performance, opening Pcbnew with no file gives:
>
> #4      3.36%   __gconv_transform_utf8_internal
> #5      2.51%   __mbsrtowcs_l
> #6      2.50%   wxMBConv::ToWChar
> #8      2.07%   std::basic_string<wxhar_t>::_M_assign
> #9      1.88%   wxMBConvStrictUTF8::ToWChar
> #14     1.27%   EscapeString (kicad function)
> #17     0.85%   __GI___strlen_sse2
>
>                           #18     0.85%  wxUniChar::From8bit
>
>
> #19     0.84%  wxUniChar::operator==
>
> And plenty more string-y things in the top 50 or so lines. So it seems
> the biggest cost for strings is converting them from UTF-8 to wchar_t
> strings in WX (this is probably not the same on Windows). But it's not
> really a stunning cost.
>
> However, loading the CIAA board, and there are basically no string
> operations above 0.5%, and only a handful even above 0.25%. When doing
> DRC, strings don't break 0.1%: nearly all the significant work is
> looking things up in std::maps and geometry.
>
> So string performance doesn't seem to be *that* critical, as it's
> quickly drowned out under real workloads. It looks to me (and I'm happy
> to be corrected, I'm not a perf expert), like string operations in KiCad
> are not much of a bottleneck.
>
>  > Because characters are different lengths, you have to scan the string
>  > to find the n’th character.
>
> Even with UTF-32, you can only do an O(1) lookup of the n'th *code
> point* or *code unit* (the same in UTF-32, not in UTF-8), not the n'th
> *encoded character*.
>
> That's true even if you normalise the strings first. Not all code points
> map one-to-one to an encoded character (it can be one-to-none,
> one-to-one, many-to-one). And that's even without considering grapheme
> clustering.
>
> Cheers,
>
> John
>
> PS / OT: If we had to optimise one thing,
> PolygonTriangulation::Vertex::inTriangle is the single hungriest
> function, chewing 6.19% of all CPU time, double that of each of the next
> 3: __gnu_cxx::__exchange_and_add (2.76%),  PolygonTriangulation::isEar
> (2.73%) and even malloc (2.27%).
>
> Other than that fairly mundane 6%-er, there are no eye-popping
> performance hogs simply on loading a PCB. Which is nice.
>
> [1]:
>
> https://github.com/ciaa/Hardware/blob/master/PCB/ACC/CIAA_ACC/ciaa_acc.kicad_pcb
>
> _______________________________________________
> Mailing list: https://launchpad.net/~kicad-developers
> Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp
>

References