kicad-developers team mailing list archive

Thread
Date

Re: 6.0 string proposal

To: Jeff Young <jeff@xxxxxxxxx>, Dmitry Salychev <darkness.bsd@xxxxxxxxx>
From: John Beard <john.j.beard@xxxxxxxxx>
Date: Tue, 30 Apr 2019 17:22:04 +0100
Cc: kicad-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <119AB1E2-2C1A-408B-94B5-654BD283DE1F@rokeby.ie>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0

On 30/04/2019 16:01, Jeff Young wrote:

Primarily for performance reasons.


WRT performance, I did a few benchmarks for reference (on Linux)

Loading this large CIAA PCB[1] allocates, out of a peak usage of 467MBof heap with a 0.01% threshold:


* 9.6MB of std::basic_string<wchar_t>::_M_assign
   * 9.4MB of this is from wxString operator= assignments
* ~600kB of std::basic_string<wchar_t>::_M_construct, (wxString ctor)

So I'm not sure memory usage is a major factor to worry about (stringsallocate storage on the heap, so we should see basically all theinteresting things in the heap profile). UTF-8 could be as little as 1/4UTF-32 (all strings are ASCII), but even then, it's a few MB saved.


Now, in terms of performance, opening Pcbnew with no file gives:

#4      3.36%	__gconv_transform_utf8_internal	
#5      2.51%   __mbsrtowcs_l
#6      2.50%   wxMBConv::ToWChar
#8      2.07%   std::basic_string<wxhar_t>::_M_assign
#9      1.88%   wxMBConvStrictUTF8::ToWChar
#14     1.27%   EscapeString (kicad function)

#17 0.85% __GI___strlen_sse2#18 0.85% wxUniChar::From8bit


#19     0.84%  wxUniChar::operator==

And plenty more string-y things in the top 50 or so lines. So it seemsthe biggest cost for strings is converting them from UTF-8 to wchar_tstrings in WX (this is probably not the same on Windows). But it's notreally a stunning cost.

However, loading the CIAA board, and there are basically no stringoperations above 0.5%, and only a handful even above 0.25%. When doingDRC, strings don't break 0.1%: nearly all the significant work islooking things up in std::maps and geometry.

So string performance doesn't seem to be *that* critical, as it'squickly drowned out under real workloads. It looks to me (and I'm happyto be corrected, I'm not a perf expert), like string operations in KiCadare not much of a bottleneck.


> Because characters are different lengths, you have to scan the string
> to find the n’th character.

Even with UTF-32, you can only do an O(1) lookup of the n'th *codepoint* or *code unit* (the same in UTF-32, not in UTF-8), not the n'th*encoded character*.

That's true even if you normalise the strings first. Not all code pointsmap one-to-one to an encoded character (it can be one-to-none,one-to-one, many-to-one). And that's even without considering graphemeclustering.


Cheers,

John

PS / OT: If we had to optimise one thing,PolygonTriangulation::Vertex::inTriangle is the single hungriestfunction, chewing 6.19% of all CPU time, double that of each of the next3: __gnu_cxx::__exchange_and_add (2.76%), PolygonTriangulation::isEar(2.73%) and even malloc (2.27%).

Other than that fairly mundane 6%-er, there are no eye-poppingperformance hogs simply on loading a PCB. Which is nice.

[1]:https://github.com/ciaa/Hardware/blob/master/PCB/ACC/CIAA_ACC/ciaa_acc.kicad_pcb

Follow ups

Re: 6.0 string proposal
From: Jon Evans, 2019-05-01
Re: 6.0 string proposal
From: Jeff Young, 2019-04-30
Re: 6.0 string proposal
From: Seth Hillbrand, 2019-04-30

References

6.0 string proposal
From: Jeff Young, 2019-04-30
Re: 6.0 string proposal
From: Andrew Lutsenko, 2019-04-30
Re: 6.0 string proposal
From: Jeff Young, 2019-04-30
Re: 6.0 string proposal
From: Wayne Stambaugh, 2019-04-30
Re: 6.0 string proposal
From: Dmitry Salychev, 2019-04-30
Re: 6.0 string proposal
From: Jeff Young, 2019-04-30