yade-dev team mailing list archive
-
yade-dev team
-
Mailing list archive
-
Message #10513
Re: parallel collider - testing needed
There is apparently a problem with your computer/compilation option/other?
If you run an ordinary simulation with -j4 and many particles do you see
4 cores used?
Bruno
On 25/02/14 16:26, Christian Jakob wrote:
> Hi Bruno,
>
> I did some tests with your new collider:
>
> My "old" machine (2 cpu sockets with 4 cores each, Intel(R) Xeon(R)
> CPU X5460 @ 3.16GHz) says:
>
>
> yade-trunk -j4 --performance
>
> Welcome to Yade 2014-02-18.git-af75797
> .....
> number of bodies 200813
>
> Elapsed 74.6882498264 sec
> Performance 2.67779738399 iter/sec
> Extrapolation on 1e5 iters 10.3733680314 hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name Count
> Time Rel. time
> -------------------------------------------------------------------------------------------------------
>
> ForceResetter 200
> 2625848us 3.52%
> InsertionSortCollider 7
> 21494603us 28.79%
> InteractionLoop 200
> 32631323us 43.70%
> NewtonIntegrator 200
> 17913859us 23.99%
> TOTAL
> 74665635us 100.00%
>
> Common time 3845.09048295 s
>
>
> Calculation velocity is unstable, try to close all programs and start
> performance tests again
> 5037 spheres, velocity= 44.7832284176 +- 60.1189421161 %
> 25103 spheres, velocity= 17.4121076601 +- 0.99355345037 %
> 50250 spheres, velocity= 10.0714940216 +- 1.53896666769 %
> 100467 spheres, velocity= 5.05891811219 +- 0.434738330959 %
> 200813 spheres, velocity= 2.65826879857 +- 0.933088603948 %
>
>
> SCORE: 3479
> Number of threads 4
>
> ....
>
> ###########################################################
>
> yade-parallel -j4 --performance (your pc branch)
>
> Welcome to Yade 2014-02-24.git-b60d388
> .....
> number of bodies 200813
>
> Elapsed 75.6688189507 sec
> Performance 2.64309662518 iter/sec
> Extrapolation on 1e5 iters 10.5095581876 hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name Count
> Time Rel. time
> -------------------------------------------------------------------------------------------------------
>
> ForceResetter 200
> 2600100us 3.44%
> InsertionSortCollider 7
> 20746020us 27.43%
> InteractionLoop 200
> 34455725us 45.55%
> NewtonIntegrator 200
> 17838205us 23.58%
> TOTAL
> 75640051us 100.00%
>
> Common time 4093.34840894 s
>
>
> Calculation velocity is unstable, try to close all programs and start
> performance tests again
> 5037 spheres, velocity= 44.3999135517 +- 61.0812025756 %
> 25103 spheres, velocity= 16.8531534243 +- 1.32470154863 %
> 50250 spheres, velocity= 9.61504490252 +- 0.670186229301 %
> 100467 spheres, velocity= 4.86679881913 +- 0.487840014886 %
> 200813 spheres, velocity= 2.64490152313 +- 0.285084118261 %
>
>
> SCORE: 3402
> Number of threads 4
>
> ######################################################
>
>
> For my computer it seems to have nearly no speed up ...
>
> Looking at htop tells my, that -j4 --performance is using 4 threads,
> but just on 1 core ...
>
> Regards,
>
> Christian
>
>
>
> Zitat von Bruno Chareyre <bruno.chareyre@xxxxxxxxxxx>:
>
>> Hi there,
>> I implemented a parallel version of the InsertionSortCollider. It is
>> almost ready but not yet pushed to the main trunk, as I have a few
>> things to check before that.
>> It would be helpful if some of you could 1/ test that your scripts work
>> correctly and 2/ benchmark this for N>100k and j>4.
>> If you run benchmarks, please remember to always activate timing and
>> report the result of timing.stats(). It gives much more interesting data
>> than the wall clock time.
>>
>> Preliminary benchmark results are below (from my laptop...), showing a
>> speedup by a factor 2 on the total computation time for j4/200k
>> particles (compared to the sequential collider).
>> The speedup on collider alone is in fact of the order of x3.68 for 4
>> threads. Nearly linear at least for such small number of threads.
>>
>> My expectation is that it should change almost nothing for small number
>> of particles (say, N<10k), where colliding is an inexpensive step.
>> For 1million of particles OTOH, there could be significant speedup,
>> since the collider takes most of the time.
>>
>> You can get the "pc" branch at my github repo:
>> git clone -b pc https://github.com/bchareyre/trunk.git
>>
>> Results of yade -j4 --performance are below (I7 quad-core with
>> hyperthreading enabled, lightly loaded by background tasks - j>4 not
>> reported as hyperthreading is probably doing no good).
>>
>> Happy benchmarking. :)
>>
>> Bruno
>>
>>
>> ====================
>> ./yade-trunk -j4 --performance (the current trunk)
>> .......
>> number of bodies 200813
>>
>> Elapsed 29.4102840424 sec
>> Performance 6.80034234664 iter/sec
>> Extrapolation on 1e5 iters 4.08476167255 hours
>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
>> Name
>> Count Time Rel. time
>> -------------------------------------------------------------------------------------------------------
>>
>> ForceResetter 200
>> 700881us 2.38%
>> InsertionSortCollider 7
>> 18816625us 64.02%
>> InteractionLoop 200
>> 6581283us 22.39%
>> NewtonIntegrator 200
>> 3293119us 11.20%
>> TOTAL
>> 29391910us 100.00%
>>
>> Common time 597.731503963 s
>>
>>
>> 5037 spheres, velocity= 327.689688709 +- 5.13604387635 %
>> 25103 spheres, velocity= 81.2726909754 +- 1.0105334405 %
>> 50250 spheres, velocity= 45.4114521341 +- 3.02333274436 %
>> 100467 spheres, velocity= 19.0287424005 +- 2.26073439157 %
>> 200813 spheres, velocity= 6.51664351023 +- 4.03351515402 %
>>
>>
>> SCORE: 13777
>> Number of threads 4
>>
>>
>> ========================
>> ./yade-parallel -j4 --performance (my "pc" branch)
>> ....
>>
>> number of bodies 200813
>>
>> Elapsed 15.4320101738 sec
>> Performance 12.9600744004 iter/sec
>> Extrapolation on 1e5 iters 2.14333474636 hours
>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
>> Name
>> Count Time Rel. time
>> -------------------------------------------------------------------------------------------------------
>>
>> ForceResetter 200
>> 671157us 4.36%
>> InsertionSortCollider 7
>> 5145114us 33.42%
>> boundDispatcher 7
>> 93186us 1.81%
>> bound
>> 7 12us 0.00%
>> copy 7
>> 160891us 3.13%
>> erase 7
>> 66932us 1.30%
>> sort&collide 7
>> 4824071us 93.76%
>> TOTAL 35
>> 5145095us 100.00%
>> InteractionLoop 200
>> 6545848us 42.52%
>> NewtonIntegrator 200
>> 3030989us 19.69%
>> TOTAL
>> 15393110us 100.00%
>>
>> Common time 460.37680912 s
>>
>>
>> 5037 spheres, velocity= 365.599773471 +- 8.02397068512 %
>> 25103 spheres, velocity= 92.0077536966 +- 3.81069496509 %
>> 50250 spheres, velocity= 54.1683980588 +- 0.528288534811 %
>> 100467 spheres, velocity= 25.7134767981 +- 1.0796373464 %
>> 200813 spheres, velocity= 12.6488486429 +- 4.66276699319 %
>>
>>
>> SCORE: 18800
>> Number of threads 4
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~yade-dev
>> Post to : yade-dev@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~yade-dev
>> More help : https://help.launchpad.net/ListHelp
>>
>
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~yade-dev
> Post to : yade-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~yade-dev
> More help : https://help.launchpad.net/ListHelp
>
>
>
--
_______________
Bruno Chareyre
Associate Professor
ENSE³ - Grenoble INP
Lab. 3SR
BP 53
38041 Grenoble cedex 9
Tél : +33 4 56 52 86 21
Fax : +33 4 76 82 70 43
________________
Follow ups
References