dolfin team mailing list archive

Thread
Date

Re: Results: Parallel speedup

To: dolfin-dev@xxxxxxxxxx
From: Johan Hake <hake@xxxxxxxxx>
Date: Mon, 21 Sep 2009 22:54:08 +0200
Delivered-to: dolfin-dev@xxxxxxxxxx
In-reply-to: <20090921204629.GH13010@olorin>
User-agent: KMail/1.12.1 (Linux/2.6.28-15-generic; KDE/4.3.1; i686; ; )

On Monday 21 September 2009 22:46:29 Anders Logg wrote:
> On Mon, Sep 21, 2009 at 09:44:11PM +0200, Johan Hake wrote:
> > On Monday 21 September 2009 21:37:03 Anders Logg wrote:
> > > Johan and I have set up a benchmark for parallel speedup in
> > >
> > >   bench/fem/speedup
> > >
> > > Here are some preliminary results:
> > >
> > >   Speedup  |  Assemble  Assemble + solve
> > >   --------------------------------------
> > >   1        |         1                 1
> > >   2        |    1.4351            4.0785
> > >   4        |    2.3763            6.9076
> > >   8        |    3.7458            9.4648
> > >   16       |    6.3143            19.369
> > >   32       |    7.6207            33.699
> > >
> > > These numbers look a bit strange, especially the superlinear speedup
> > > for assemble + solve. There might be a bug somewhere in the benchmark
> > > code.
> > >
> > > Anyway, we have some preliminary results that at least show some kind
> > > of speedup.
> > >
> > > It would be interesting to hear some comments on what kind of numbers
> > > we should expect to get from Matt and others.
> > >
> > > The benchmark is for assembling and solving Poisson on a 64 x 64 x 64
> > > mesh using PETSc/MUMPS. Partitioning time is not included in the
> > > numbers.
> >
> > What solver is used when the number of processors is 1? If this is
> > different from MUMPS, we will have the performance difference between the
> > two solvers included in the speedup bump when going from 1 -> 2
> > processors.
> 
> It's the default PETSc LU solver which should be UMFPACK.
> 
> So one explanation could be that MUMPS is twice as fast as UMFPACK
> (looking at the speedup for two processes), which means we should
> divide the numbers by 2, giving a speedup of 17 instead of 34 which
> would be more reasonable.
>
> The total speedup of 17 includes both assemble + solve. Since assemble
> is obviously not scaling as it should, MUMPS may still be scaling
> pretty good.

We might add a second figure for the speedup measurement, which measures the 
relative speedup for each doubling of the processors. Then we would get rid of 
the MUMPS vs UMFPACK "bug" in the measurements.

Johan

> So some preliminary conclusions are:
> 
> 1. Something is not right with assembly.
> 
> 2. MUMPS scales well and runs relatively faster than UMFPACK.
>
> --
> Anders
>

Follow ups

Re: Results: Parallel speedup
From: Garth N. Wells, 2009-09-24

References

Results: Parallel speedup
From: Anders Logg, 2009-09-21
Re: Results: Parallel speedup
From: Johan Hake, 2009-09-21
Re: Results: Parallel speedup
From: Anders Logg, 2009-09-21