← Back to team overview

yade-dev team mailing list archive

Re: parallel collider - testing needed

 

Is there any difference at all on this machine, between -j1 and -j4?

B

On 25/02/14 18:56, Bruno Chareyre wrote:
> There is apparently a problem with your computer/compilation option/other?
> If you run an ordinary simulation with -j4 and many particles do you see
> 4 cores used?
>
> Bruno
>
>
>
> On 25/02/14 16:26, Christian Jakob wrote:
>> Hi Bruno,
>>
>> I did some tests with your new collider:
>>
>> My "old" machine (2 cpu sockets with 4 cores each, Intel(R) Xeon(R)
>> CPU X5460  @ 3.16GHz) says:
>>
>>
>> yade-trunk -j4 --performance
>>
>> Welcome to Yade 2014-02-18.git-af75797
>> .....
>> number of bodies 200813
>>
>> Elapsed  74.6882498264  sec
>> Performance  2.67779738399  iter/sec
>> Extrapolation on 1e5 iters  10.3733680314  hours
>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
>> Name                                                    Count         
>>        Time            Rel. time
>> -------------------------------------------------------------------------------------------------------
>>
>> ForceResetter                                       200           
>> 2625848us                3.52%
>> InsertionSortCollider                                 7          
>> 21494603us               28.79%
>> InteractionLoop                                     200          
>> 32631323us               43.70%
>> NewtonIntegrator                                    200          
>> 17913859us               23.99%
>> TOTAL                                                            
>> 74665635us              100.00%
>>
>> Common time  3845.09048295 s
>>
>>
>> Calculation velocity is unstable, try to close all programs and start
>> performance tests again
>> 5037  spheres, velocity= 44.7832284176 +- 60.1189421161 %
>> 25103  spheres, velocity= 17.4121076601 +- 0.99355345037 %
>> 50250  spheres, velocity= 10.0714940216 +- 1.53896666769 %
>> 100467  spheres, velocity= 5.05891811219 +- 0.434738330959 %
>> 200813  spheres, velocity= 2.65826879857 +- 0.933088603948 %
>>
>>
>> SCORE: 3479
>> Number of threads  4
>>
>> ....
>>
>> ###########################################################
>>
>> yade-parallel -j4 --performance (your pc branch)
>>
>> Welcome to Yade 2014-02-24.git-b60d388
>> .....
>> number of bodies 200813
>>
>> Elapsed  75.6688189507  sec
>> Performance  2.64309662518  iter/sec
>> Extrapolation on 1e5 iters  10.5095581876  hours
>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
>> Name                                                    Count         
>>        Time            Rel. time
>> -------------------------------------------------------------------------------------------------------
>>
>> ForceResetter                                       200           
>> 2600100us                3.44%
>> InsertionSortCollider                                 7          
>> 20746020us               27.43%
>> InteractionLoop                                     200          
>> 34455725us               45.55%
>> NewtonIntegrator                                    200          
>> 17838205us               23.58%
>> TOTAL                                                            
>> 75640051us              100.00%
>>
>> Common time  4093.34840894 s
>>
>>
>> Calculation velocity is unstable, try to close all programs and start
>> performance tests again
>> 5037  spheres, velocity= 44.3999135517 +- 61.0812025756 %
>> 25103  spheres, velocity= 16.8531534243 +- 1.32470154863 %
>> 50250  spheres, velocity= 9.61504490252 +- 0.670186229301 %
>> 100467  spheres, velocity= 4.86679881913 +- 0.487840014886 %
>> 200813  spheres, velocity= 2.64490152313 +- 0.285084118261 %
>>
>>
>> SCORE: 3402
>> Number of threads  4
>>
>> ######################################################
>>
>>
>> For my computer it seems to have nearly no speed up ...
>>
>> Looking at htop tells my, that -j4 --performance is using 4 threads,
>> but just on 1 core ...
>>
>> Regards,
>>
>> Christian
>>
>>
>>
>> Zitat von Bruno Chareyre <bruno.chareyre@xxxxxxxxxxx>:
>>
>>> Hi there,
>>> I implemented a parallel version of the InsertionSortCollider. It is
>>> almost ready but not yet pushed to the main trunk, as I have a few
>>> things to check before that.
>>> It would be helpful if some of you could 1/ test that your scripts work
>>> correctly and 2/ benchmark this for N>100k and j>4.
>>> If you run benchmarks, please remember to always activate timing and
>>> report the result of timing.stats(). It gives much more interesting data
>>> than the wall clock time.
>>>
>>> Preliminary benchmark results are below (from my laptop...), showing a
>>> speedup by a factor 2 on the total computation time for j4/200k
>>> particles (compared to the sequential collider).
>>> The speedup on collider alone is in fact of the order of x3.68 for 4
>>> threads. Nearly linear at least for such small number of threads.
>>>
>>> My expectation is that it should change almost nothing for small number
>>> of particles (say, N<10k), where colliding is an inexpensive step.
>>> For 1million of particles OTOH, there could be significant speedup,
>>> since the collider takes most of the time.
>>>
>>> You can get the "pc" branch at my github repo:
>>> git clone -b pc https://github.com/bchareyre/trunk.git
>>>
>>> Results of yade -j4 --performance are below (I7 quad-core with
>>> hyperthreading enabled, lightly loaded by background tasks -  j>4 not
>>> reported as hyperthreading is probably doing no good).
>>>
>>> Happy benchmarking. :)
>>>
>>> Bruno
>>>
>>>
>>> ====================
>>> ./yade-trunk -j4 --performance  (the current trunk)
>>> .......
>>> number of bodies 200813
>>>
>>> Elapsed  29.4102840424  sec
>>> Performance  6.80034234664  iter/sec
>>> Extrapolation on 1e5 iters  4.08476167255  hours
>>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
>>> Name
>>> Count                 Time            Rel. time
>>> -------------------------------------------------------------------------------------------------------
>>>
>>> ForceResetter                                       200
>>> 700881us                2.38%
>>> InsertionSortCollider                                 7
>>> 18816625us               64.02%
>>> InteractionLoop                                     200
>>> 6581283us               22.39%
>>> NewtonIntegrator                                    200
>>> 3293119us               11.20%
>>> TOTAL
>>> 29391910us              100.00%
>>>
>>> Common time  597.731503963 s
>>>
>>>
>>> 5037  spheres, velocity= 327.689688709 +- 5.13604387635 %
>>> 25103  spheres, velocity= 81.2726909754 +- 1.0105334405 %
>>> 50250  spheres, velocity= 45.4114521341 +- 3.02333274436 %
>>> 100467  spheres, velocity= 19.0287424005 +- 2.26073439157 %
>>> 200813  spheres, velocity= 6.51664351023 +- 4.03351515402 %
>>>
>>>
>>> SCORE: 13777
>>> Number of threads  4
>>>
>>>
>>> ========================
>>> ./yade-parallel -j4 --performance  (my "pc" branch)
>>> ....
>>>
>>> number of bodies 200813
>>>
>>> Elapsed  15.4320101738  sec
>>> Performance  12.9600744004  iter/sec
>>> Extrapolation on 1e5 iters  2.14333474636  hours
>>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
>>> Name
>>> Count                 Time            Rel. time
>>> -------------------------------------------------------------------------------------------------------
>>>
>>> ForceResetter                                       200
>>> 671157us                4.36%
>>> InsertionSortCollider                                 7
>>> 5145114us               33.42%
>>>   boundDispatcher                                       7
>>> 93186us                1.81%
>>>   bound
>>> 7                 12us                0.00%
>>>   copy                                                  7
>>> 160891us                3.13%
>>>   erase                                                 7
>>> 66932us                1.30%
>>>   sort&collide                                          7
>>> 4824071us               93.76%
>>>   TOTAL                                                35
>>> 5145095us              100.00%
>>> InteractionLoop                                     200
>>> 6545848us               42.52%
>>> NewtonIntegrator                                    200
>>> 3030989us               19.69%
>>> TOTAL
>>> 15393110us              100.00%
>>>
>>> Common time  460.37680912 s
>>>
>>>
>>> 5037  spheres, velocity= 365.599773471 +- 8.02397068512 %
>>> 25103  spheres, velocity= 92.0077536966 +- 3.81069496509 %
>>> 50250  spheres, velocity= 54.1683980588 +- 0.528288534811 %
>>> 100467  spheres, velocity= 25.7134767981 +- 1.0796373464 %
>>> 200813  spheres, velocity= 12.6488486429 +- 4.66276699319 %
>>>
>>>
>>> SCORE: 18800
>>> Number of threads  4
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~yade-dev
>>> Post to     : yade-dev@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~yade-dev
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~yade-dev
>> Post to     : yade-dev@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~yade-dev
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>>
>


-- 
_______________
Bruno Chareyre
Associate Professor
ENSE³ - Grenoble INP
Lab. 3SR
BP 53
38041 Grenoble cedex 9
Tél : +33 4 56 52 86 21
Fax : +33 4 76 82 70 43
________________



Follow ups

References