yade-dev team mailing list archive
-
yade-dev team
-
Mailing list archive
-
Message #10498
parallel collider - testing needed
Hi there,
I implemented a parallel version of the InsertionSortCollider. It is
almost ready but not yet pushed to the main trunk, as I have a few
things to check before that.
It would be helpful if some of you could 1/ test that your scripts work
correctly and 2/ benchmark this for N>100k and j>4.
If you run benchmarks, please remember to always activate timing and
report the result of timing.stats(). It gives much more interesting data
than the wall clock time.
Preliminary benchmark results are below (from my laptop...), showing a
speedup by a factor 2 on the total computation time for j4/200k
particles (compared to the sequential collider).
The speedup on collider alone is in fact of the order of x3.68 for 4
threads. Nearly linear at least for such small number of threads.
My expectation is that it should change almost nothing for small number
of particles (say, N<10k), where colliding is an inexpensive step.
For 1million of particles OTOH, there could be significant speedup,
since the collider takes most of the time.
You can get the "pc" branch at my github repo:
git clone -b pc https://github.com/bchareyre/trunk.git
Results of yade -j4 --performance are below (I7 quad-core with
hyperthreading enabled, lightly loaded by background tasks - j>4 not
reported as hyperthreading is probably doing no good).
Happy benchmarking. :)
Bruno
====================
./yade-trunk -j4 --performance (the current trunk)
.......
number of bodies 200813
Elapsed 29.4102840424 sec
Performance 6.80034234664 iter/sec
Extrapolation on 1e5 iters 4.08476167255 hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name
Count Time Rel. time
-------------------------------------------------------------------------------------------------------
ForceResetter 200
700881us 2.38%
InsertionSortCollider 7
18816625us 64.02%
InteractionLoop 200
6581283us 22.39%
NewtonIntegrator 200
3293119us 11.20%
TOTAL
29391910us 100.00%
Common time 597.731503963 s
5037 spheres, velocity= 327.689688709 +- 5.13604387635 %
25103 spheres, velocity= 81.2726909754 +- 1.0105334405 %
50250 spheres, velocity= 45.4114521341 +- 3.02333274436 %
100467 spheres, velocity= 19.0287424005 +- 2.26073439157 %
200813 spheres, velocity= 6.51664351023 +- 4.03351515402 %
SCORE: 13777
Number of threads 4
========================
./yade-parallel -j4 --performance (my "pc" branch)
....
number of bodies 200813
Elapsed 15.4320101738 sec
Performance 12.9600744004 iter/sec
Extrapolation on 1e5 iters 2.14333474636 hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name
Count Time Rel. time
-------------------------------------------------------------------------------------------------------
ForceResetter 200
671157us 4.36%
InsertionSortCollider 7
5145114us 33.42%
boundDispatcher 7
93186us 1.81%
bound
7 12us 0.00%
copy 7
160891us 3.13%
erase 7
66932us 1.30%
sort&collide 7
4824071us 93.76%
TOTAL 35
5145095us 100.00%
InteractionLoop 200
6545848us 42.52%
NewtonIntegrator 200
3030989us 19.69%
TOTAL
15393110us 100.00%
Common time 460.37680912 s
5037 spheres, velocity= 365.599773471 +- 8.02397068512 %
25103 spheres, velocity= 92.0077536966 +- 3.81069496509 %
50250 spheres, velocity= 54.1683980588 +- 0.528288534811 %
100467 spheres, velocity= 25.7134767981 +- 1.0796373464 %
200813 spheres, velocity= 12.6488486429 +- 4.66276699319 %
SCORE: 18800
Number of threads 4
Follow ups