yade-dev team mailing list archive
-
yade-dev team
-
Mailing list archive
-
Message #10514
Re: parallel collider - testing needed
Is there any difference at all on this machine, between -j1 and -j4?
B
On 25/02/14 18:56, Bruno Chareyre wrote:
> There is apparently a problem with your computer/compilation option/other?
> If you run an ordinary simulation with -j4 and many particles do you see
> 4 cores used?
>
> Bruno
>
>
>
> On 25/02/14 16:26, Christian Jakob wrote:
>> Hi Bruno,
>>
>> I did some tests with your new collider:
>>
>> My "old" machine (2 cpu sockets with 4 cores each, Intel(R) Xeon(R)
>> CPU X5460 @ 3.16GHz) says:
>>
>>
>> yade-trunk -j4 --performance
>>
>> Welcome to Yade 2014-02-18.git-af75797
>> .....
>> number of bodies 200813
>>
>> Elapsed 74.6882498264 sec
>> Performance 2.67779738399 iter/sec
>> Extrapolation on 1e5 iters 10.3733680314 hours
>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
>> Name Count
>> Time Rel. time
>> -------------------------------------------------------------------------------------------------------
>>
>> ForceResetter 200
>> 2625848us 3.52%
>> InsertionSortCollider 7
>> 21494603us 28.79%
>> InteractionLoop 200
>> 32631323us 43.70%
>> NewtonIntegrator 200
>> 17913859us 23.99%
>> TOTAL
>> 74665635us 100.00%
>>
>> Common time 3845.09048295 s
>>
>>
>> Calculation velocity is unstable, try to close all programs and start
>> performance tests again
>> 5037 spheres, velocity= 44.7832284176 +- 60.1189421161 %
>> 25103 spheres, velocity= 17.4121076601 +- 0.99355345037 %
>> 50250 spheres, velocity= 10.0714940216 +- 1.53896666769 %
>> 100467 spheres, velocity= 5.05891811219 +- 0.434738330959 %
>> 200813 spheres, velocity= 2.65826879857 +- 0.933088603948 %
>>
>>
>> SCORE: 3479
>> Number of threads 4
>>
>> ....
>>
>> ###########################################################
>>
>> yade-parallel -j4 --performance (your pc branch)
>>
>> Welcome to Yade 2014-02-24.git-b60d388
>> .....
>> number of bodies 200813
>>
>> Elapsed 75.6688189507 sec
>> Performance 2.64309662518 iter/sec
>> Extrapolation on 1e5 iters 10.5095581876 hours
>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
>> Name Count
>> Time Rel. time
>> -------------------------------------------------------------------------------------------------------
>>
>> ForceResetter 200
>> 2600100us 3.44%
>> InsertionSortCollider 7
>> 20746020us 27.43%
>> InteractionLoop 200
>> 34455725us 45.55%
>> NewtonIntegrator 200
>> 17838205us 23.58%
>> TOTAL
>> 75640051us 100.00%
>>
>> Common time 4093.34840894 s
>>
>>
>> Calculation velocity is unstable, try to close all programs and start
>> performance tests again
>> 5037 spheres, velocity= 44.3999135517 +- 61.0812025756 %
>> 25103 spheres, velocity= 16.8531534243 +- 1.32470154863 %
>> 50250 spheres, velocity= 9.61504490252 +- 0.670186229301 %
>> 100467 spheres, velocity= 4.86679881913 +- 0.487840014886 %
>> 200813 spheres, velocity= 2.64490152313 +- 0.285084118261 %
>>
>>
>> SCORE: 3402
>> Number of threads 4
>>
>> ######################################################
>>
>>
>> For my computer it seems to have nearly no speed up ...
>>
>> Looking at htop tells my, that -j4 --performance is using 4 threads,
>> but just on 1 core ...
>>
>> Regards,
>>
>> Christian
>>
>>
>>
>> Zitat von Bruno Chareyre <bruno.chareyre@xxxxxxxxxxx>:
>>
>>> Hi there,
>>> I implemented a parallel version of the InsertionSortCollider. It is
>>> almost ready but not yet pushed to the main trunk, as I have a few
>>> things to check before that.
>>> It would be helpful if some of you could 1/ test that your scripts work
>>> correctly and 2/ benchmark this for N>100k and j>4.
>>> If you run benchmarks, please remember to always activate timing and
>>> report the result of timing.stats(). It gives much more interesting data
>>> than the wall clock time.
>>>
>>> Preliminary benchmark results are below (from my laptop...), showing a
>>> speedup by a factor 2 on the total computation time for j4/200k
>>> particles (compared to the sequential collider).
>>> The speedup on collider alone is in fact of the order of x3.68 for 4
>>> threads. Nearly linear at least for such small number of threads.
>>>
>>> My expectation is that it should change almost nothing for small number
>>> of particles (say, N<10k), where colliding is an inexpensive step.
>>> For 1million of particles OTOH, there could be significant speedup,
>>> since the collider takes most of the time.
>>>
>>> You can get the "pc" branch at my github repo:
>>> git clone -b pc https://github.com/bchareyre/trunk.git
>>>
>>> Results of yade -j4 --performance are below (I7 quad-core with
>>> hyperthreading enabled, lightly loaded by background tasks - j>4 not
>>> reported as hyperthreading is probably doing no good).
>>>
>>> Happy benchmarking. :)
>>>
>>> Bruno
>>>
>>>
>>> ====================
>>> ./yade-trunk -j4 --performance (the current trunk)
>>> .......
>>> number of bodies 200813
>>>
>>> Elapsed 29.4102840424 sec
>>> Performance 6.80034234664 iter/sec
>>> Extrapolation on 1e5 iters 4.08476167255 hours
>>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
>>> Name
>>> Count Time Rel. time
>>> -------------------------------------------------------------------------------------------------------
>>>
>>> ForceResetter 200
>>> 700881us 2.38%
>>> InsertionSortCollider 7
>>> 18816625us 64.02%
>>> InteractionLoop 200
>>> 6581283us 22.39%
>>> NewtonIntegrator 200
>>> 3293119us 11.20%
>>> TOTAL
>>> 29391910us 100.00%
>>>
>>> Common time 597.731503963 s
>>>
>>>
>>> 5037 spheres, velocity= 327.689688709 +- 5.13604387635 %
>>> 25103 spheres, velocity= 81.2726909754 +- 1.0105334405 %
>>> 50250 spheres, velocity= 45.4114521341 +- 3.02333274436 %
>>> 100467 spheres, velocity= 19.0287424005 +- 2.26073439157 %
>>> 200813 spheres, velocity= 6.51664351023 +- 4.03351515402 %
>>>
>>>
>>> SCORE: 13777
>>> Number of threads 4
>>>
>>>
>>> ========================
>>> ./yade-parallel -j4 --performance (my "pc" branch)
>>> ....
>>>
>>> number of bodies 200813
>>>
>>> Elapsed 15.4320101738 sec
>>> Performance 12.9600744004 iter/sec
>>> Extrapolation on 1e5 iters 2.14333474636 hours
>>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
>>> Name
>>> Count Time Rel. time
>>> -------------------------------------------------------------------------------------------------------
>>>
>>> ForceResetter 200
>>> 671157us 4.36%
>>> InsertionSortCollider 7
>>> 5145114us 33.42%
>>> boundDispatcher 7
>>> 93186us 1.81%
>>> bound
>>> 7 12us 0.00%
>>> copy 7
>>> 160891us 3.13%
>>> erase 7
>>> 66932us 1.30%
>>> sort&collide 7
>>> 4824071us 93.76%
>>> TOTAL 35
>>> 5145095us 100.00%
>>> InteractionLoop 200
>>> 6545848us 42.52%
>>> NewtonIntegrator 200
>>> 3030989us 19.69%
>>> TOTAL
>>> 15393110us 100.00%
>>>
>>> Common time 460.37680912 s
>>>
>>>
>>> 5037 spheres, velocity= 365.599773471 +- 8.02397068512 %
>>> 25103 spheres, velocity= 92.0077536966 +- 3.81069496509 %
>>> 50250 spheres, velocity= 54.1683980588 +- 0.528288534811 %
>>> 100467 spheres, velocity= 25.7134767981 +- 1.0796373464 %
>>> 200813 spheres, velocity= 12.6488486429 +- 4.66276699319 %
>>>
>>>
>>> SCORE: 18800
>>> Number of threads 4
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~yade-dev
>>> Post to : yade-dev@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~yade-dev
>>> More help : https://help.launchpad.net/ListHelp
>>>
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~yade-dev
>> Post to : yade-dev@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~yade-dev
>> More help : https://help.launchpad.net/ListHelp
>>
>>
>>
>
--
_______________
Bruno Chareyre
Associate Professor
ENSE³ - Grenoble INP
Lab. 3SR
BP 53
38041 Grenoble cedex 9
Tél : +33 4 56 52 86 21
Fax : +33 4 76 82 70 43
________________
Follow ups
References