maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #06704
Re: Next steps in improving single-threaded performance
Hi Kristian,
just out of curiosity: is it possible to find out which functions cause highest
amount of icache misses? Can it have anything to do with branch misprediction?
Regards,
Sergey
On Fri, Jan 24, 2014 at 03:51:25PM +0100, Kristian Nielsen wrote:
> I have been analysing CPU bottlenecks in single-threaded sysbench read-only
> load. I found that icache misses is the main bottleneck, and that
> profile-guided compiler optimisation (PGO) with GCC gives a large speedup, 25%
> or more.
>
> (More details in my blog posts:
>
> http://kristiannielsen.livejournal.com/17676.html
> http://kristiannielsen.livejournal.com/18168.html
> )
>
> Now I would like to ask for some discussions/help in how to get this
> implemented in practice. It involves changing the build process for our
> binaries: First compile with gcc --coverage, then run some profile workload,
> then recompile with -fprofile-use.
>
> I implemented a simple program to generate some profile load:
>
> https://github.com/knielsen/gen_profile_load
>
> It runs a bunch of simple insert/select/update/delete, with different
> combinations of storage engine, binlog format, and client API. It is designed
> to run inside the build tree and handle starting and stopping the server being
> tested, so it is pretty close to a working setup. These commands work to
> generate a binary that is faster due to PGO:
>
> mkdir bld
> cd bld
> cmake -DWITHOUT_PERFSCHEMA_STORAGE_ENGINE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 --coverage" -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 --coverage" ..
> make
>
> tests/gen_profile_load
>
> cmake -DWITHOUT_PERFSCHEMA_STORAGE_ENGINE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 -fprofile-use -fprofile-correction" -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 -fprofile-use -fprofile-correction"
> make
>
> So all the pieces really are there, it should be possible to implement it. But
> we need to find a good way to integrate it into our build system.
>
> The best would be to integrate it into our cmake files.
>
> The gen_profile_load.c could go into tests/, ideally we would build both a
> static and dynamically linked version (so we get PGO for both libmysqlclient.a
> and libmysqlclient.so). Anyone can help me get cmake to do that?
>
> And it would be cool if we could get the above procedure to work completely
> within cmake, so that the user could just do:
>
> cmake -DWITH_PGO ... ; make
>
> and cmake would itself handle first building with --coverage, then running
> gen_profile_load.static and gen_profile_load.dynamic, then rebuilding with
> -fprofile-use. Anyone know if this is possible with cmake, and if so could
> help implement it?
>
> But alternatively, we could integrate a double build, like the commands above,
> into the buildbot scripts (.deb, .rpm, bintar).
>
> Any comments? Here are some more points:
>
> - I tested that gen_profile_load gives a good speedup of sysbench read-only
> (around 30%, so still very significant even though it generates a different
> and more varied load).
>
> - As another test, I removed all SELECT from gen_profile_load, and ran the
> resulting PGO binary with sysbench read-only. This still gave a fair
> speedup, despite the PGO load being completely different from the benchmark
> load. This gives me confidence that the PGO should not cause performance
> regressions in cases not covered well by gen_profile_load
>
> - More tests would be nice, of course. Axel, would you be able to build some
> binaries following above procedure, and test some different random
> benchmarks? Anything that is easy to run could be interesting, both to test
> for improvement, and to check against regressions.
>
> - We probably need a recent GCC version to get good results. I used GCC
> version 4.7.2. Maybe we should install this GCC version in all the VMs we
> use to build binaries?
>
> - Should we do this in 5.5? I think we might want to. The speedup is quite
> significant, and it seems very safe - no code modifications are involved,
> only different compiler options.
>
> Any thoughts? Volunteeres for helping with the cmake or buildbot parts?
>
> - Kristian.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help : https://help.launchpad.net/ListHelp
Follow ups
References