← Back to team overview

yade-dev team mailing list archive

parallel collider - testing needed

 

Hi there,
I implemented a parallel version of the InsertionSortCollider. It is
almost ready but not yet pushed to the main trunk, as I have a few
things to check before that.
It would be helpful if some of you could 1/ test that your scripts work
correctly and 2/ benchmark this for N>100k and j>4.
If you run benchmarks, please remember to always activate timing and
report the result of timing.stats(). It gives much more interesting data
than the wall clock time.

Preliminary benchmark results are below (from my laptop...), showing a
speedup by a factor 2 on the total computation time for j4/200k
particles (compared to the sequential collider).
The speedup on collider alone is in fact of the order of x3.68 for 4
threads. Nearly linear at least for such small number of threads.

My expectation is that it should change almost nothing for small number
of particles (say, N<10k), where colliding is an inexpensive step.
For 1million of particles OTOH, there could be significant speedup,
since the collider takes most of the time.

You can get the "pc" branch at my github repo:
git clone -b pc https://github.com/bchareyre/trunk.git

Results of yade -j4 --performance are below (I7 quad-core with
hyperthreading enabled, lightly loaded by background tasks -  j>4 not
reported as hyperthreading is probably doing no good).

Happy benchmarking. :)

Bruno


====================
./yade-trunk -j4 --performance  (the current trunk)
.......
number of bodies 200813

Elapsed  29.4102840424  sec
Performance  6.80034234664  iter/sec
Extrapolation on 1e5 iters  4.08476167255  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name                                                   
Count                 Time            Rel. time
-------------------------------------------------------------------------------------------------------
ForceResetter                                       200            
700881us                2.38%     
InsertionSortCollider                                 7          
18816625us               64.02%     
InteractionLoop                                     200           
6581283us               22.39%     
NewtonIntegrator                                    200           
3293119us               11.20%     
TOTAL                                                            
29391910us              100.00%     

Common time  597.731503963 s


5037  spheres, velocity= 327.689688709 +- 5.13604387635 %
25103  spheres, velocity= 81.2726909754 +- 1.0105334405 %
50250  spheres, velocity= 45.4114521341 +- 3.02333274436 %
100467  spheres, velocity= 19.0287424005 +- 2.26073439157 %
200813  spheres, velocity= 6.51664351023 +- 4.03351515402 %


SCORE: 13777
Number of threads  4


========================
./yade-parallel -j4 --performance  (my "pc" branch)
....

number of bodies 200813

Elapsed  15.4320101738  sec
Performance  12.9600744004  iter/sec
Extrapolation on 1e5 iters  2.14333474636  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name                                                   
Count                 Time            Rel. time
-------------------------------------------------------------------------------------------------------
ForceResetter                                       200            
671157us                4.36%     
InsertionSortCollider                                 7           
5145114us               33.42%     
  boundDispatcher                                       7             
93186us                1.81%   
  bound                                                
7                 12us                0.00%   
  copy                                                  7            
160891us                3.13%   
  erase                                                 7             
66932us                1.30%   
  sort&collide                                          7           
4824071us               93.76%   
  TOTAL                                                35           
5145095us              100.00%   
InteractionLoop                                     200           
6545848us               42.52%     
NewtonIntegrator                                    200           
3030989us               19.69%     
TOTAL                                                            
15393110us              100.00%     

Common time  460.37680912 s


5037  spheres, velocity= 365.599773471 +- 8.02397068512 %
25103  spheres, velocity= 92.0077536966 +- 3.81069496509 %
50250  spheres, velocity= 54.1683980588 +- 0.528288534811 %
100467  spheres, velocity= 25.7134767981 +- 1.0796373464 %
200813  spheres, velocity= 12.6488486429 +- 4.66276699319 %


SCORE: 18800
Number of threads  4



Follow ups