← Back to team overview

yade-dev team mailing list archive

Re: parallel collider - testing needed

 

hi bruno,

i use your first version of the parallel collider for quiet a while during model development and also calibration. i saw no differences between yade-1.07 and your version.

i did some benchmarks with 4 to 16 sandy bridge cores at our bull cluster. getting more than 16 cores for openmp applications is quit difficult.
done on an  exclusively used 16 core node

=============== 1 threads =============================
number of bodies 200813

Elapsed  47.6222550869  sec
Performance  4.19971712039  iter/sec
Extrapolation on 1e5 iters  6.6142020954  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count                 Time            Rel. time
-------------------------------------------------------------------------------------------------------
ForceResetter 200             594120us                1.25%
InsertionSortCollider 7 15686671us 32.95% InteractionLoop 200 21787610us 45.76% NewtonIntegrator 200 9541243us 20.04%
TOTAL 47609645us              100.00%

Common time  1383.60180092 s


5037  spheres, velocity= 103.875852973 +- 6.56561134015 %
25103  spheres, velocity= 31.681069095 +- 3.69992939292 %
50250  spheres, velocity= 15.6112167455 +- 0.651579666153 %
100467  spheres, velocity= 7.65955209926 +- 0.740064173207 %
Calculation velocity is unstable, try to close all programs and start performance tests again
200813  spheres, velocity= 4.52368811131 +- 12.3907756519 %


SCORE: 6055
Number of threads  1
=============== 4 threads =============================
number of bodies 200813

Elapsed  29.6409780979  sec
Performance  6.7474156669  iter/sec
Extrapolation on 1e5 iters  4.1168025136  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count                 Time            Rel. time
-------------------------------------------------------------------------------------------------------
ForceResetter 200 2919976us 9.85% InsertionSortCollider 7 15675024us 52.89% InteractionLoop 200 5309648us 17.92% NewtonIntegrator 200 5730646us 19.34%
TOTAL 29635295us              100.00%

Common time  641.693111897 s


Calculation velocity is unstable, try to close all programs and start performance tests again
5037  spheres, velocity= 232.725838879 +- 14.3014472878 %
Calculation velocity is unstable, try to close all programs and start performance tests again
25103  spheres, velocity= 72.3475644141 +- 12.8106054968 %
50250  spheres, velocity= 50.2926096116 +- 3.01250915287 %
100467  spheres, velocity= 18.9664279425 +- 1.40241049531 %
200813  spheres, velocity= 6.95879166249 +- 2.72955035307 %


SCORE: 13080
Number of threads  4
=============== 8 threads =============================
number of bodies 200813

Elapsed  28.8497908115  sec
Performance  6.9324592787  iter/sec
Extrapolation on 1e5 iters  4.00691539049  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count                 Time            Rel. time
-------------------------------------------------------------------------------------------------------
ForceResetter 200 4760739us 16.51% InsertionSortCollider 7 15682352us 54.38% InteractionLoop 200 3398981us 11.79% NewtonIntegrator 200 4997676us 17.33%
TOTAL 28839750us              100.00%

Common time  629.34264183 s


Calculation velocity is unstable, try to close all programs and start performance tests again
5037  spheres, velocity= 242.232297207 +- 18.7054194438 %
25103  spheres, velocity= 78.2112705997 +- 4.19360243937 %
50250  spheres, velocity= 46.6877664726 +- 2.81481812835 %
100467  spheres, velocity= 19.9932164704 +- 3.06039659404 %
200813  spheres, velocity= 6.92396036557 +- 0.361116951928 %


SCORE: 13272
Number of threads  8
=============== 12 threads =============================
number of bodies 200813

Elapsed  29.2484679222  sec
Performance  6.83796500151  iter/sec
Extrapolation on 1e5 iters  4.06228721142  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count                 Time            Rel. time
-------------------------------------------------------------------------------------------------------
ForceResetter 200 7943958us 27.17% InsertionSortCollider 7 15713441us 53.75% InteractionLoop 200 2522508us 8.63% NewtonIntegrator 200 3055652us 10.45%
TOTAL 29235560us              100.00%

Common time  667.634572983 s


5037  spheres, velocity= 189.874951285 +- 9.74398679139 %
25103  spheres, velocity= 79.4292831485 +- 6.59393629842 %
50250  spheres, velocity= 48.2684323576 +- 4.29336410346 %
100467  spheres, velocity= 19.2778991779 +- 6.87288661534 %
200813  spheres, velocity= 7.05669848487 +- 2.29609774368 %


SCORE: 12914
Number of threads  12

=============== 16 threads =============================
number of bodies 200813

Elapsed  27.1387059689  sec
Performance  7.36954813651  iter/sec
Extrapolation on 1e5 iters  3.7692647179  hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count                 Time            Rel. time
-------------------------------------------------------------------------------------------------------
ForceResetter 200 5765158us 21.26% InsertionSortCollider 7 15685115us 57.84% InteractionLoop 200 2002979us 7.39% NewtonIntegrator 200 3665586us 13.52%
TOTAL 27118839us              100.00%

Common time  781.653450966 s


5037  spheres, velocity= 155.295128456 +- 5.31523351848 %
25103  spheres, velocity= 58.9500296071 +- 7.67003146996 %
50250  spheres, velocity= 38.5475112683 +- 2.84583454585 %
100467  spheres, velocity= 17.2375970816 +- 6.15206324777 %
200813  spheres, velocity= 6.87034005987 +- 7.15657372906 %

SCORE: 11009
Number of threads  16


matthias

On 10.04.2014 12:58, Bruno Chareyre wrote:
On 10/04/14 02:01, Klaus Thoeni wrote:
just to clarify, Test 2 is done by increasing the number of iterations (1x, 3x
and 12x the number of iterations specified in checkPerf.py). This means the
number of interactions should increase as well and, hence, particle velocities
should decrease because of more interactions.
That is what I was thinking. And more interactions means less (relative)
time spent in collider.

I added a table with the collider scaling factor for 1 million particles and
iter x 12.
Thanks! So there is still an optimum near 12-14. It may be possible to
improve (choosing approriate chunksizes internally), but it needs
serious testing.

Note your T(j8)=T(j1)/5.8 is actually T(j8)=T(j1)/4.8. Where did you get the
number from? You must look into the uploaded files in order to get this numbers
I used the x1 line since I was not expecting any influence of the number
of steps on the collider's performance:
187/20=5.8
Now I see it is different with other lines. Weird.

Bruno


_______________________________________________
Mailing list: https://launchpad.net/~yade-dev
Post to     : yade-dev@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~yade-dev
More help   : https://help.launchpad.net/ListHelp


--
----------------------------
Dipl.-Inf. Matthias Frank
wissenschaftlicher Mitarbeiter

Technische Universität Dresden
Fakultät Maschinenwesen
Institut für Verarbeitungsmaschinen und mobile Arbeitsmaschinen
Professur für Verarbeitungsmaschinen und Verarbeitungstechnik

01062 Dresden

Tel.: +49 351 463 36124
E-Mail: matthias.frank@xxxxxxxxxxxxx
www.vat.tu-dresden.de



Follow ups

References