On Mon, Sep 21, 2009 at 09:44:11PM +0200, Johan Hake wrote:
On Monday 21 September 2009 21:37:03 Anders Logg wrote:
Johan and I have set up a benchmark for parallel speedup in
bench/fem/speedup
Here are some preliminary results:
Speedup | Assemble Assemble + solve
--------------------------------------
1 | 1 1
2 | 1.4351 4.0785
4 | 2.3763 6.9076
8 | 3.7458 9.4648
16 | 6.3143 19.369
32 | 7.6207 33.699
These numbers look a bit strange, especially the superlinear speedup
for assemble + solve. There might be a bug somewhere in the benchmark
code.
Anyway, we have some preliminary results that at least show some kind
of speedup.
It would be interesting to hear some comments on what kind of numbers
we should expect to get from Matt and others.
The benchmark is for assembling and solving Poisson on a 64 x 64 x 64
mesh using PETSc/MUMPS. Partitioning time is not included in the
numbers.
What solver is used when the number of processors is 1? If this is
different from MUMPS, we will have the performance difference between the
two solvers included in the speedup bump when going from 1 -> 2
processors.
It's the default PETSc LU solver which should be UMFPACK.
So one explanation could be that MUMPS is twice as fast as UMFPACK
(looking at the speedup for two processes), which means we should
divide the numbers by 2, giving a speedup of 17 instead of 34 which
would be more reasonable.
The total speedup of 17 includes both assemble + solve. Since assemble
is obviously not scaling as it should, MUMPS may still be scaling
pretty good.