dolfin team mailing list archive

Thread
Date

Re: Results: Parallel speedup

To: Johan Hake <hake@xxxxxxxxx>
From: "Garth N. Wells" <gnw20@xxxxxxxxx>
Date: Thu, 24 Sep 2009 18:06:20 +0100
Cc: dolfin-dev@xxxxxxxxxx
Delivered-to: dolfin-dev@xxxxxxxxxx
In-reply-to: <4ABB91BC.9020402@cam.ac.uk>
User-agent: Thunderbird 2.0.0.23 (X11/20090817)



Garth N. Wells wrote:


Johan Hake wrote:

On Monday 21 September 2009 22:46:29 Anders Logg wrote:

On Mon, Sep 21, 2009 at 09:44:11PM +0200, Johan Hake wrote:

On Monday 21 September 2009 21:37:03 Anders Logg wrote:

Johan and I have set up a benchmark for parallel speedup in

  bench/fem/speedup

Here are some preliminary results:

  Speedup  |  Assemble  Assemble + solve
  --------------------------------------
  1        |         1                 1
  2        |    1.4351            4.0785
  4        |    2.3763            6.9076
  8        |    3.7458            9.4648
  16       |    6.3143            19.369
  32       |    7.6207            33.699

These numbers look a bit strange, especially the superlinear speedup
for assemble + solve. There might be a bug somewhere in the benchmark
code.

Anyway, we have some preliminary results that at least show some kind
of speedup.

It would be interesting to hear some comments on what kind of numbers
we should expect to get from Matt and others.

The benchmark is for assembling and solving Poisson on a 64 x 64 x 64
mesh using PETSc/MUMPS. Partitioning time is not included in the
numbers.

What solver is used when the number of processors is 1? If this is
different from MUMPS, we will have the performance difference between the
two solvers included in the speedup bump when going from 1 -> 2
processors.

It's the default PETSc LU solver which should be UMFPACK.

So one explanation could be that MUMPS is twice as fast as UMFPACK
(looking at the speedup for two processes), which means we should
divide the numbers by 2, giving a speedup of 17 instead of 34 which
would be more reasonable.

The total speedup of 17 includes both assemble + solve. Since assemble
is obviously not scaling as it should, MUMPS may still be scaling
pretty good.

We might add a second figure for the speedup measurement, which measures therelative speedup for each doubling of the processors. Then we would get rid ofthe MUMPS vs UMFPACK "bug" in the measurements.


Here are some benchmarks for LU solvers

    www.mis.mpg.de/preprints/2008/preprint2008_65.pdf

From a quick look, for Poisson in 3D (Figure 6), MUMPS could well bemore than a factor or two faster than UMFPACK.

For a 48x48x48 Poisson problem, MUMPS is 3.25 times faster than UMFPACKon a single process. For smaller problems, UMFPACK can be faster.


Garth

Garth

Johan

So some preliminary conclusions are:

1. Something is not right with assembly.

2. MUMPS scales well and runs relatively faster than UMFPACK.

--
Anders

_______________________________________________
DOLFIN-dev mailing list
DOLFIN-dev@xxxxxxxxxx
http://www.fenics.org/mailman/listinfo/dolfin-dev


_______________________________________________
DOLFIN-dev mailing list
DOLFIN-dev@xxxxxxxxxx
http://www.fenics.org/mailman/listinfo/dolfin-dev

Follow ups

Re: Results: Parallel speedup
From: Anders Logg, 2009-09-24

References

Results: Parallel speedup
From: Anders Logg, 2009-09-21
Re: Results: Parallel speedup
From: Johan Hake, 2009-09-21
Re: Results: Parallel speedup
From: Anders Logg, 2009-09-21
Re: Results: Parallel speedup
From: Johan Hake, 2009-09-21
Re: Results: Parallel speedup
From: Garth N. Wells, 2009-09-24