yade-dev team mailing list archive
-
yade-dev team
-
Mailing list archive
-
Message #02009
Re: Perfomance benchmark
> For the one that you sent:
>
> 1. if you don't use InteractionDispatchers, therefore the parallel
> performance will be MUCH worse (3 loops instead of 1)
> 2. SQLiteRecorder is just leftover, I assume.
> 3. First step should not be measured, since the collider is being
> initialized (not proportional to N)
> 4. Use InsertionSortCOllider::velocityBins, it eliminates lots of time
> spend otherwise in the collider.
>
I optimized the own test in accordance with your, Vaclav, recomendations
(InteractionDispatchers, Colliders nBins and first step) and also run
cyl2-openmp test. So, let me give results.
1. Optimizing really improves performance... but only for THREADS=2.
Compare pb1 and pb2. As I understand it, a InteractionDispatcher
provides the improvement, and colliders parameters has no effect. Why?
2. To determine the real performance requires many iterations. Compare
pb2 (30000 iters after no measured 10000 initial iters) and cyl2-openmp
(1000 iters) . For the same configuration and the number of particles in
the cyl2-openmp performance in more two times than in pb2 . In addition,
cyl2 is linear, whereas pb2 no. So, to estimate the perfomance for worst
case necessary to consider the mixing particles (case for pb* but not
for cyl2 due to small numIters), in order to realistically estimate the
performance of collider's sorting algorithm and creating/deleting
interactions. But many iterations is time-consuming, so we need a some
synthetic test, which requires a small number of iterations, but
provides intensive mixing of particles. May be put it to blueprint?
3. Vaclav, in your cyl2 test of i7 the tPerStep/N is increases with the
number of particles (seems tPerStep(N)~N*log(N)), whereas in test of
Opteron it is the contrary decreases (seems tPerStep(N)~log(N)). Why?
Best regards, Sergei D.
References