maria-developers team mailing list archive
Mailing list archive
Re: Next steps in improving single-threaded performance
Sergey Vojtovich <svoj@xxxxxxxxxxx> writes:
> Still a questions mostly to educate myself. According to proc mysqld executable
> size is something like:
> VmExe: 12228 kB
> VmLib: 6272 kB
> I assume the above refers to overall instructions. Level 1 instruction cache
> size is like 32Kb, right?
> When you say that we're executing too much code per request, did you mean the
No. I was refering to the actual code that is touched by the given load.
In my sysbench read-only benchmarks, we run around 40000 instructions per
query. But some of those are in loops, so it is unknown how many distinct
instructions need to be fetched (maybe cachegrind could help determine this).
If the live set, that is the actual instructions executed in a given load,
would fit in L1 instruction cache, then we would see a large gain in
performance. That might not be possible to achieve, though.
My hypothesis is that the reduction in icache misses from PGO comes from the
compiler being able to re-arrange the basic blocks of the code so that the
actual benchmark load ends up with fewer and larger straight-line code
execution paths. This would help reduce the number of half-used cache lines in
the icache, and also help the hardware prefetcher being able to reduce the
impact of icache misses.
The actual size of the executable does not matter much, only the parts that
are actually executed during a given load.
> Do you think we can get similar speedup by putting compiler hints (e.g.
> likely/unlikely) and code optimizations?
I do not know for sure, but I think it is unlikely. We may be able to get some
of the speedup with such hints. But as I remember the GCC documentation, there
are a number of optimisations that are only enabled if actually using
profile-guided optimisation. But it is hard to say for sure...