← Back to team overview

yade-dev team mailing list archive

Re: parallel collider - testing needed

 

Thanks Matthias,
Actually I don't understand your benchmark results. You are the first
one to find no speedup on the colliding part.
It seems the results below were not using the parallel collider, since
the time it takes is exactly the same for all number of threads.
What version is that (diplayed at yade startup)?
Bruno

On 16/04/14 17:14, Matthias Frank wrote:
> hi bruno,
>
> i use your first version of the parallel collider for quiet a while
> during model development and also calibration. i saw no differences
> between yade-1.07 and your version.
>
> i did some benchmarks with 4 to 16 sandy bridge cores at our bull
> cluster. getting more than 16 cores for openmp applications is quit
> difficult.
> done on an  exclusively used 16 core node
>
> =============== 1 threads =============================
> number of bodies 200813
>
> Elapsed  47.6222550869  sec
> Performance  4.19971712039  iter/sec
> Extrapolation on 1e5 iters  6.6142020954  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name Count                 Time            Rel. time
> -------------------------------------------------------------------------------------------------------
>
> ForceResetter 200             594120us                1.25%
> InsertionSortCollider                                 7
> 15686671us               32.95%
> InteractionLoop                                     200
> 21787610us               45.76%
> NewtonIntegrator                                    200
> 9541243us               20.04%
> TOTAL 47609645us              100.00%
>
> Common time  1383.60180092 s
>
>
> 5037  spheres, velocity= 103.875852973 +- 6.56561134015 %
> 25103  spheres, velocity= 31.681069095 +- 3.69992939292 %
> 50250  spheres, velocity= 15.6112167455 +- 0.651579666153 %
> 100467  spheres, velocity= 7.65955209926 +- 0.740064173207 %
> Calculation velocity is unstable, try to close all programs and start
> performance tests again
> 200813  spheres, velocity= 4.52368811131 +- 12.3907756519 %
>
>
> SCORE: 6055
> Number of threads  1
> =============== 4 threads =============================
> number of bodies 200813
>
> Elapsed  29.6409780979  sec
> Performance  6.7474156669  iter/sec
> Extrapolation on 1e5 iters  4.1168025136  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name Count                 Time            Rel. time
> -------------------------------------------------------------------------------------------------------
>
> ForceResetter                                       200
> 2919976us                9.85%
> InsertionSortCollider                                 7
> 15675024us               52.89%
> InteractionLoop                                     200
> 5309648us               17.92%
> NewtonIntegrator                                    200
> 5730646us               19.34%
> TOTAL 29635295us              100.00%
>
> Common time  641.693111897 s
>
>
> Calculation velocity is unstable, try to close all programs and start
> performance tests again
> 5037  spheres, velocity= 232.725838879 +- 14.3014472878 %
> Calculation velocity is unstable, try to close all programs and start
> performance tests again
> 25103  spheres, velocity= 72.3475644141 +- 12.8106054968 %
> 50250  spheres, velocity= 50.2926096116 +- 3.01250915287 %
> 100467  spheres, velocity= 18.9664279425 +- 1.40241049531 %
> 200813  spheres, velocity= 6.95879166249 +- 2.72955035307 %
>
>
> SCORE: 13080
> Number of threads  4
> =============== 8 threads =============================
> number of bodies 200813
>
> Elapsed  28.8497908115  sec
> Performance  6.9324592787  iter/sec
> Extrapolation on 1e5 iters  4.00691539049  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name Count                 Time            Rel. time
> -------------------------------------------------------------------------------------------------------
>
> ForceResetter                                       200
> 4760739us               16.51%
> InsertionSortCollider                                 7
> 15682352us               54.38%
> InteractionLoop                                     200
> 3398981us               11.79%
> NewtonIntegrator                                    200
> 4997676us               17.33%
> TOTAL 28839750us              100.00%
>
> Common time  629.34264183 s
>
>
> Calculation velocity is unstable, try to close all programs and start
> performance tests again
> 5037  spheres, velocity= 242.232297207 +- 18.7054194438 %
> 25103  spheres, velocity= 78.2112705997 +- 4.19360243937 %
> 50250  spheres, velocity= 46.6877664726 +- 2.81481812835 %
> 100467  spheres, velocity= 19.9932164704 +- 3.06039659404 %
> 200813  spheres, velocity= 6.92396036557 +- 0.361116951928 %
>
>
> SCORE: 13272
> Number of threads  8
> =============== 12 threads =============================
> number of bodies 200813
>
> Elapsed  29.2484679222  sec
> Performance  6.83796500151  iter/sec
> Extrapolation on 1e5 iters  4.06228721142  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name Count                 Time            Rel. time
> -------------------------------------------------------------------------------------------------------
>
> ForceResetter                                       200
> 7943958us               27.17%
> InsertionSortCollider                                 7
> 15713441us               53.75%
> InteractionLoop                                     200
> 2522508us                8.63%
> NewtonIntegrator                                    200
> 3055652us               10.45%
> TOTAL 29235560us              100.00%
>
> Common time  667.634572983 s
>
>
> 5037  spheres, velocity= 189.874951285 +- 9.74398679139 %
> 25103  spheres, velocity= 79.4292831485 +- 6.59393629842 %
> 50250  spheres, velocity= 48.2684323576 +- 4.29336410346 %
> 100467  spheres, velocity= 19.2778991779 +- 6.87288661534 %
> 200813  spheres, velocity= 7.05669848487 +- 2.29609774368 %
>
>
> SCORE: 12914
> Number of threads  12
>
> =============== 16 threads =============================
> number of bodies 200813
>
> Elapsed  27.1387059689  sec
> Performance  7.36954813651  iter/sec
> Extrapolation on 1e5 iters  3.7692647179  hours
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> Name Count                 Time            Rel. time
> -------------------------------------------------------------------------------------------------------
>
> ForceResetter                                       200
> 5765158us               21.26%
> InsertionSortCollider                                 7
> 15685115us               57.84%
> InteractionLoop                                     200
> 2002979us                7.39%
> NewtonIntegrator                                    200
> 3665586us               13.52%
> TOTAL 27118839us              100.00%
>
> Common time  781.653450966 s
>
>
> 5037  spheres, velocity= 155.295128456 +- 5.31523351848 %
> 25103  spheres, velocity= 58.9500296071 +- 7.67003146996 %
> 50250  spheres, velocity= 38.5475112683 +- 2.84583454585 %
> 100467  spheres, velocity= 17.2375970816 +- 6.15206324777 %
> 200813  spheres, velocity= 6.87034005987 +- 7.15657372906 %
>
> SCORE: 11009
> Number of threads  16
>
>
> matthias
>
> On 10.04.2014 12:58, Bruno Chareyre wrote:
>> On 10/04/14 02:01, Klaus Thoeni wrote:
>>> just to clarify, Test 2 is done by increasing the number of
>>> iterations (1x, 3x
>>> and 12x the number of iterations specified in checkPerf.py). This
>>> means the
>>> number of interactions should increase as well and, hence, particle
>>> velocities
>>> should decrease because of more interactions.
>> That is what I was thinking. And more interactions means less (relative)
>> time spent in collider.
>>
>>> I added a table with the collider scaling factor for 1 million
>>> particles and
>>> iter x 12.
>> Thanks! So there is still an optimum near 12-14. It may be possible to
>> improve (choosing approriate chunksizes internally), but it needs
>> serious testing.
>>
>>> Note your T(j8)=T(j1)/5.8 is actually T(j8)=T(j1)/4.8. Where did you
>>> get the
>>> number from? You must look into the uploaded files in order to get
>>> this numbers
>> I used the x1 line since I was not expecting any influence of the number
>> of steps on the collider's performance:
>> 187/20=5.8
>> Now I see it is different with other lines. Weird.
>>
>> Bruno
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~yade-dev
>> Post to     : yade-dev@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~yade-dev
>> More help   : https://help.launchpad.net/ListHelp
>
>


-- 
_______________
Bruno Chareyre
Associate Professor
ENSE³ - Grenoble INP
Lab. 3SR
BP 53
38041 Grenoble cedex 9
Tél : +33 4 56 52 86 21
Fax : +33 4 76 82 70 43
________________



References