← Back to team overview

kicad-developers team mailing list archive

Re: 6.0 string proposal

 

On 30/04/2019 16:01, Jeff Young wrote:
Primarily for performance reasons.

WRT performance, I did a few benchmarks for reference (on Linux)

Loading this large CIAA PCB[1] allocates, out of a peak usage of 467MB of heap with a 0.01% threshold:

* 9.6MB of std::basic_string<wchar_t>::_M_assign
   * 9.4MB of this is from wxString operator= assignments
* ~600kB of std::basic_string<wchar_t>::_M_construct, (wxString ctor)

So I'm not sure memory usage is a major factor to worry about (strings allocate storage on the heap, so we should see basically all the interesting things in the heap profile). UTF-8 could be as little as 1/4 UTF-32 (all strings are ASCII), but even then, it's a few MB saved.

Now, in terms of performance, opening Pcbnew with no file gives:

#4      3.36%	__gconv_transform_utf8_internal	
#5      2.51%   __mbsrtowcs_l
#6      2.50%   wxMBConv::ToWChar
#8      2.07%   std::basic_string<wxhar_t>::_M_assign
#9      1.88%   wxMBConvStrictUTF8::ToWChar
#14     1.27%   EscapeString (kicad function)
#17 0.85% __GI___strlen_sse2 #18 0.85% wxUniChar::From8bit

#19     0.84%  wxUniChar::operator==

And plenty more string-y things in the top 50 or so lines. So it seems the biggest cost for strings is converting them from UTF-8 to wchar_t strings in WX (this is probably not the same on Windows). But it's not really a stunning cost.

However, loading the CIAA board, and there are basically no string operations above 0.5%, and only a handful even above 0.25%. When doing DRC, strings don't break 0.1%: nearly all the significant work is looking things up in std::maps and geometry.

So string performance doesn't seem to be *that* critical, as it's quickly drowned out under real workloads. It looks to me (and I'm happy to be corrected, I'm not a perf expert), like string operations in KiCad are not much of a bottleneck.

> Because characters are different lengths, you have to scan the string
> to find the n’th character.

Even with UTF-32, you can only do an O(1) lookup of the n'th *code point* or *code unit* (the same in UTF-32, not in UTF-8), not the n'th *encoded character*.

That's true even if you normalise the strings first. Not all code points map one-to-one to an encoded character (it can be one-to-none, one-to-one, many-to-one). And that's even without considering grapheme clustering.

Cheers,

John

PS / OT: If we had to optimise one thing, PolygonTriangulation::Vertex::inTriangle is the single hungriest function, chewing 6.19% of all CPU time, double that of each of the next 3: __gnu_cxx::__exchange_and_add (2.76%), PolygonTriangulation::isEar (2.73%) and even malloc (2.27%).

Other than that fairly mundane 6%-er, there are no eye-popping performance hogs simply on loading a PCB. Which is nice.

[1]: https://github.com/ciaa/Hardware/blob/master/PCB/ACC/CIAA_ACC/ciaa_acc.kicad_pcb


Follow ups

References