← Back to team overview

yade-dev team mailing list archive

Re: parallel collider - testing needed

 

There is apparently a problem with your computer/compilation option/other?
If you run an ordinary simulation with -j4 and many particles do you see
4 cores used?

Bruno



On 25/02/14 16:26, Christian Jakob wrote:
> Hi Bruno,
>
> I did some tests with your new collider:
>
> My "old" machine (2 cpu sockets with 4 cores each, Intel(R) Xeon(R)
> CPU X5460  @ 3.16GHz) says:
>
>
> yade-trunk -j4 --performance
>
> Welcome to Yade 2014-02-18.git-af75797
> .....
> number of bodies 200813
>
> Elapsed  74.6882498264  sec
> Performance  2.67779738399  iter/sec
> Extrapolation on 1e5 iters  10.3733680314  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name                                                    Count         
>        Time            Rel. time
> -------------------------------------------------------------------------------------------------------
>
> ForceResetter                                       200           
> 2625848us                3.52%
> InsertionSortCollider                                 7          
> 21494603us               28.79%
> InteractionLoop                                     200          
> 32631323us               43.70%
> NewtonIntegrator                                    200          
> 17913859us               23.99%
> TOTAL                                                            
> 74665635us              100.00%
>
> Common time  3845.09048295 s
>
>
> Calculation velocity is unstable, try to close all programs and start
> performance tests again
> 5037  spheres, velocity= 44.7832284176 +- 60.1189421161 %
> 25103  spheres, velocity= 17.4121076601 +- 0.99355345037 %
> 50250  spheres, velocity= 10.0714940216 +- 1.53896666769 %
> 100467  spheres, velocity= 5.05891811219 +- 0.434738330959 %
> 200813  spheres, velocity= 2.65826879857 +- 0.933088603948 %
>
>
> SCORE: 3479
> Number of threads  4
>
> ....
>
> ###########################################################
>
> yade-parallel -j4 --performance (your pc branch)
>
> Welcome to Yade 2014-02-24.git-b60d388
> .....
> number of bodies 200813
>
> Elapsed  75.6688189507  sec
> Performance  2.64309662518  iter/sec
> Extrapolation on 1e5 iters  10.5095581876  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name                                                    Count         
>        Time            Rel. time
> -------------------------------------------------------------------------------------------------------
>
> ForceResetter                                       200           
> 2600100us                3.44%
> InsertionSortCollider                                 7          
> 20746020us               27.43%
> InteractionLoop                                     200          
> 34455725us               45.55%
> NewtonIntegrator                                    200          
> 17838205us               23.58%
> TOTAL                                                            
> 75640051us              100.00%
>
> Common time  4093.34840894 s
>
>
> Calculation velocity is unstable, try to close all programs and start
> performance tests again
> 5037  spheres, velocity= 44.3999135517 +- 61.0812025756 %
> 25103  spheres, velocity= 16.8531534243 +- 1.32470154863 %
> 50250  spheres, velocity= 9.61504490252 +- 0.670186229301 %
> 100467  spheres, velocity= 4.86679881913 +- 0.487840014886 %
> 200813  spheres, velocity= 2.64490152313 +- 0.285084118261 %
>
>
> SCORE: 3402
> Number of threads  4
>
> ######################################################
>
>
> For my computer it seems to have nearly no speed up ...
>
> Looking at htop tells my, that -j4 --performance is using 4 threads,
> but just on 1 core ...
>
> Regards,
>
> Christian
>
>
>
> Zitat von Bruno Chareyre <bruno.chareyre@xxxxxxxxxxx>:
>
>> Hi there,
>> I implemented a parallel version of the InsertionSortCollider. It is
>> almost ready but not yet pushed to the main trunk, as I have a few
>> things to check before that.
>> It would be helpful if some of you could 1/ test that your scripts work
>> correctly and 2/ benchmark this for N>100k and j>4.
>> If you run benchmarks, please remember to always activate timing and
>> report the result of timing.stats(). It gives much more interesting data
>> than the wall clock time.
>>
>> Preliminary benchmark results are below (from my laptop...), showing a
>> speedup by a factor 2 on the total computation time for j4/200k
>> particles (compared to the sequential collider).
>> The speedup on collider alone is in fact of the order of x3.68 for 4
>> threads. Nearly linear at least for such small number of threads.
>>
>> My expectation is that it should change almost nothing for small number
>> of particles (say, N<10k), where colliding is an inexpensive step.
>> For 1million of particles OTOH, there could be significant speedup,
>> since the collider takes most of the time.
>>
>> You can get the "pc" branch at my github repo:
>> git clone -b pc https://github.com/bchareyre/trunk.git
>>
>> Results of yade -j4 --performance are below (I7 quad-core with
>> hyperthreading enabled, lightly loaded by background tasks -  j>4 not
>> reported as hyperthreading is probably doing no good).
>>
>> Happy benchmarking. :)
>>
>> Bruno
>>
>>
>> ====================
>> ./yade-trunk -j4 --performance  (the current trunk)
>> .......
>> number of bodies 200813
>>
>> Elapsed  29.4102840424  sec
>> Performance  6.80034234664  iter/sec
>> Extrapolation on 1e5 iters  4.08476167255  hours
>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
>> Name
>> Count                 Time            Rel. time
>> -------------------------------------------------------------------------------------------------------
>>
>> ForceResetter                                       200
>> 700881us                2.38%
>> InsertionSortCollider                                 7
>> 18816625us               64.02%
>> InteractionLoop                                     200
>> 6581283us               22.39%
>> NewtonIntegrator                                    200
>> 3293119us               11.20%
>> TOTAL
>> 29391910us              100.00%
>>
>> Common time  597.731503963 s
>>
>>
>> 5037  spheres, velocity= 327.689688709 +- 5.13604387635 %
>> 25103  spheres, velocity= 81.2726909754 +- 1.0105334405 %
>> 50250  spheres, velocity= 45.4114521341 +- 3.02333274436 %
>> 100467  spheres, velocity= 19.0287424005 +- 2.26073439157 %
>> 200813  spheres, velocity= 6.51664351023 +- 4.03351515402 %
>>
>>
>> SCORE: 13777
>> Number of threads  4
>>
>>
>> ========================
>> ./yade-parallel -j4 --performance  (my "pc" branch)
>> ....
>>
>> number of bodies 200813
>>
>> Elapsed  15.4320101738  sec
>> Performance  12.9600744004  iter/sec
>> Extrapolation on 1e5 iters  2.14333474636  hours
>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
>> Name
>> Count                 Time            Rel. time
>> -------------------------------------------------------------------------------------------------------
>>
>> ForceResetter                                       200
>> 671157us                4.36%
>> InsertionSortCollider                                 7
>> 5145114us               33.42%
>>   boundDispatcher                                       7
>> 93186us                1.81%
>>   bound
>> 7                 12us                0.00%
>>   copy                                                  7
>> 160891us                3.13%
>>   erase                                                 7
>> 66932us                1.30%
>>   sort&collide                                          7
>> 4824071us               93.76%
>>   TOTAL                                                35
>> 5145095us              100.00%
>> InteractionLoop                                     200
>> 6545848us               42.52%
>> NewtonIntegrator                                    200
>> 3030989us               19.69%
>> TOTAL
>> 15393110us              100.00%
>>
>> Common time  460.37680912 s
>>
>>
>> 5037  spheres, velocity= 365.599773471 +- 8.02397068512 %
>> 25103  spheres, velocity= 92.0077536966 +- 3.81069496509 %
>> 50250  spheres, velocity= 54.1683980588 +- 0.528288534811 %
>> 100467  spheres, velocity= 25.7134767981 +- 1.0796373464 %
>> 200813  spheres, velocity= 12.6488486429 +- 4.66276699319 %
>>
>>
>> SCORE: 18800
>> Number of threads  4
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~yade-dev
>> Post to     : yade-dev@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~yade-dev
>> More help   : https://help.launchpad.net/ListHelp
>>
>
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~yade-dev
> Post to     : yade-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~yade-dev
> More help   : https://help.launchpad.net/ListHelp
>
>
>


-- 
_______________
Bruno Chareyre
Associate Professor
ENSE³ - Grenoble INP
Lab. 3SR
BP 53
38041 Grenoble cedex 9
Tél : +33 4 56 52 86 21
Fax : +33 4 76 82 70 43
________________



Follow ups

References