kicad-developers team mailing list archive
Mailing list archive
Re: 6.0 string proposal
On 30/04/2019 16:01, Jeff Young wrote:
Primarily for performance reasons.
WRT performance, I did a few benchmarks for reference (on Linux)
Loading this large CIAA PCB allocates, out of a peak usage of 467MB
of heap with a 0.01% threshold:
* 9.6MB of std::basic_string<wchar_t>::_M_assign
* 9.4MB of this is from wxString operator= assignments
* ~600kB of std::basic_string<wchar_t>::_M_construct, (wxString ctor)
So I'm not sure memory usage is a major factor to worry about (strings
allocate storage on the heap, so we should see basically all the
interesting things in the heap profile). UTF-8 could be as little as 1/4
UTF-32 (all strings are ASCII), but even then, it's a few MB saved.
Now, in terms of performance, opening Pcbnew with no file gives:
#4 3.36% __gconv_transform_utf8_internal
#5 2.51% __mbsrtowcs_l
#6 2.50% wxMBConv::ToWChar
#8 2.07% std::basic_string<wxhar_t>::_M_assign
#9 1.88% wxMBConvStrictUTF8::ToWChar
#14 1.27% EscapeString (kicad function)
#17 0.85% __GI___strlen_sse2
#18 0.85% wxUniChar::From8bit
#19 0.84% wxUniChar::operator==
And plenty more string-y things in the top 50 or so lines. So it seems
the biggest cost for strings is converting them from UTF-8 to wchar_t
strings in WX (this is probably not the same on Windows). But it's not
really a stunning cost.
However, loading the CIAA board, and there are basically no string
operations above 0.5%, and only a handful even above 0.25%. When doing
DRC, strings don't break 0.1%: nearly all the significant work is
looking things up in std::maps and geometry.
So string performance doesn't seem to be *that* critical, as it's
quickly drowned out under real workloads. It looks to me (and I'm happy
to be corrected, I'm not a perf expert), like string operations in KiCad
are not much of a bottleneck.
> Because characters are different lengths, you have to scan the string
> to find the n’th character.
Even with UTF-32, you can only do an O(1) lookup of the n'th *code
point* or *code unit* (the same in UTF-32, not in UTF-8), not the n'th
That's true even if you normalise the strings first. Not all code points
map one-to-one to an encoded character (it can be one-to-none,
one-to-one, many-to-one). And that's even without considering grapheme
PS / OT: If we had to optimise one thing,
PolygonTriangulation::Vertex::inTriangle is the single hungriest
function, chewing 6.19% of all CPU time, double that of each of the next
3: __gnu_cxx::__exchange_and_add (2.76%), PolygonTriangulation::isEar
(2.73%) and even malloc (2.27%).
Other than that fairly mundane 6%-er, there are no eye-popping
performance hogs simply on loading a PCB. Which is nice.