yade-dev team mailing list archive
-
yade-dev team
-
Mailing list archive
-
Message #10747
Re: parallel collider - testing needed
hi bruno,
i use your first version of the parallel collider for quiet a while
during model development and also calibration. i saw no differences
between yade-1.07 and your version.
i did some benchmarks with 4 to 16 sandy bridge cores at our bull
cluster. getting more than 16 cores for openmp applications is quit
difficult.
done on an exclusively used 16 core node
=============== 1 threads =============================
number of bodies 200813
Elapsed 47.6222550869 sec
Performance 4.19971712039 iter/sec
Extrapolation on 1e5 iters 6.6142020954 hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count Time Rel. time
-------------------------------------------------------------------------------------------------------
ForceResetter 200 594120us 1.25%
InsertionSortCollider 7
15686671us 32.95%
InteractionLoop 200
21787610us 45.76%
NewtonIntegrator 200
9541243us 20.04%
TOTAL 47609645us 100.00%
Common time 1383.60180092 s
5037 spheres, velocity= 103.875852973 +- 6.56561134015 %
25103 spheres, velocity= 31.681069095 +- 3.69992939292 %
50250 spheres, velocity= 15.6112167455 +- 0.651579666153 %
100467 spheres, velocity= 7.65955209926 +- 0.740064173207 %
Calculation velocity is unstable, try to close all programs and start
performance tests again
200813 spheres, velocity= 4.52368811131 +- 12.3907756519 %
SCORE: 6055
Number of threads 1
=============== 4 threads =============================
number of bodies 200813
Elapsed 29.6409780979 sec
Performance 6.7474156669 iter/sec
Extrapolation on 1e5 iters 4.1168025136 hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count Time Rel. time
-------------------------------------------------------------------------------------------------------
ForceResetter 200
2919976us 9.85%
InsertionSortCollider 7
15675024us 52.89%
InteractionLoop 200
5309648us 17.92%
NewtonIntegrator 200
5730646us 19.34%
TOTAL 29635295us 100.00%
Common time 641.693111897 s
Calculation velocity is unstable, try to close all programs and start
performance tests again
5037 spheres, velocity= 232.725838879 +- 14.3014472878 %
Calculation velocity is unstable, try to close all programs and start
performance tests again
25103 spheres, velocity= 72.3475644141 +- 12.8106054968 %
50250 spheres, velocity= 50.2926096116 +- 3.01250915287 %
100467 spheres, velocity= 18.9664279425 +- 1.40241049531 %
200813 spheres, velocity= 6.95879166249 +- 2.72955035307 %
SCORE: 13080
Number of threads 4
=============== 8 threads =============================
number of bodies 200813
Elapsed 28.8497908115 sec
Performance 6.9324592787 iter/sec
Extrapolation on 1e5 iters 4.00691539049 hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count Time Rel. time
-------------------------------------------------------------------------------------------------------
ForceResetter 200
4760739us 16.51%
InsertionSortCollider 7
15682352us 54.38%
InteractionLoop 200
3398981us 11.79%
NewtonIntegrator 200
4997676us 17.33%
TOTAL 28839750us 100.00%
Common time 629.34264183 s
Calculation velocity is unstable, try to close all programs and start
performance tests again
5037 spheres, velocity= 242.232297207 +- 18.7054194438 %
25103 spheres, velocity= 78.2112705997 +- 4.19360243937 %
50250 spheres, velocity= 46.6877664726 +- 2.81481812835 %
100467 spheres, velocity= 19.9932164704 +- 3.06039659404 %
200813 spheres, velocity= 6.92396036557 +- 0.361116951928 %
SCORE: 13272
Number of threads 8
=============== 12 threads =============================
number of bodies 200813
Elapsed 29.2484679222 sec
Performance 6.83796500151 iter/sec
Extrapolation on 1e5 iters 4.06228721142 hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count Time Rel. time
-------------------------------------------------------------------------------------------------------
ForceResetter 200
7943958us 27.17%
InsertionSortCollider 7
15713441us 53.75%
InteractionLoop 200
2522508us 8.63%
NewtonIntegrator 200
3055652us 10.45%
TOTAL 29235560us 100.00%
Common time 667.634572983 s
5037 spheres, velocity= 189.874951285 +- 9.74398679139 %
25103 spheres, velocity= 79.4292831485 +- 6.59393629842 %
50250 spheres, velocity= 48.2684323576 +- 4.29336410346 %
100467 spheres, velocity= 19.2778991779 +- 6.87288661534 %
200813 spheres, velocity= 7.05669848487 +- 2.29609774368 %
SCORE: 12914
Number of threads 12
=============== 16 threads =============================
number of bodies 200813
Elapsed 27.1387059689 sec
Performance 7.36954813651 iter/sec
Extrapolation on 1e5 iters 3.7692647179 hours
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Name Count Time Rel. time
-------------------------------------------------------------------------------------------------------
ForceResetter 200
5765158us 21.26%
InsertionSortCollider 7
15685115us 57.84%
InteractionLoop 200
2002979us 7.39%
NewtonIntegrator 200
3665586us 13.52%
TOTAL 27118839us 100.00%
Common time 781.653450966 s
5037 spheres, velocity= 155.295128456 +- 5.31523351848 %
25103 spheres, velocity= 58.9500296071 +- 7.67003146996 %
50250 spheres, velocity= 38.5475112683 +- 2.84583454585 %
100467 spheres, velocity= 17.2375970816 +- 6.15206324777 %
200813 spheres, velocity= 6.87034005987 +- 7.15657372906 %
SCORE: 11009
Number of threads 16
matthias
On 10.04.2014 12:58, Bruno Chareyre wrote:
On 10/04/14 02:01, Klaus Thoeni wrote:
just to clarify, Test 2 is done by increasing the number of iterations (1x, 3x
and 12x the number of iterations specified in checkPerf.py). This means the
number of interactions should increase as well and, hence, particle velocities
should decrease because of more interactions.
That is what I was thinking. And more interactions means less (relative)
time spent in collider.
I added a table with the collider scaling factor for 1 million particles and
iter x 12.
Thanks! So there is still an optimum near 12-14. It may be possible to
improve (choosing approriate chunksizes internally), but it needs
serious testing.
Note your T(j8)=T(j1)/5.8 is actually T(j8)=T(j1)/4.8. Where did you get the
number from? You must look into the uploaded files in order to get this numbers
I used the x1 line since I was not expecting any influence of the number
of steps on the collider's performance:
187/20=5.8
Now I see it is different with other lines. Weird.
Bruno
_______________________________________________
Mailing list: https://launchpad.net/~yade-dev
Post to : yade-dev@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~yade-dev
More help : https://help.launchpad.net/ListHelp
--
----------------------------
Dipl.-Inf. Matthias Frank
wissenschaftlicher Mitarbeiter
Technische Universität Dresden
Fakultät Maschinenwesen
Institut für Verarbeitungsmaschinen und mobile Arbeitsmaschinen
Professur für Verarbeitungsmaschinen und Verarbeitungstechnik
01062 Dresden
Tel.: +49 351 463 36124
E-Mail: matthias.frank@xxxxxxxxxxxxx
www.vat.tu-dresden.de
Follow ups
References