← Back to team overview

dolfin team mailing list archive

Re: Results: Parallel speedup

 

On Thu, Sep 24, 2009 at 06:06:20PM +0100, Garth N. Wells wrote:
> 
> 
> Garth N. Wells wrote:
> > 
> > Johan Hake wrote:
> >> On Monday 21 September 2009 22:46:29 Anders Logg wrote:
> >>> On Mon, Sep 21, 2009 at 09:44:11PM +0200, Johan Hake wrote:
> >>>> On Monday 21 September 2009 21:37:03 Anders Logg wrote:
> >>>>> Johan and I have set up a benchmark for parallel speedup in
> >>>>>
> >>>>>   bench/fem/speedup
> >>>>>
> >>>>> Here are some preliminary results:
> >>>>>
> >>>>>   Speedup  |  Assemble  Assemble + solve
> >>>>>   --------------------------------------
> >>>>>   1        |         1                 1
> >>>>>   2        |    1.4351            4.0785
> >>>>>   4        |    2.3763            6.9076
> >>>>>   8        |    3.7458            9.4648
> >>>>>   16       |    6.3143            19.369
> >>>>>   32       |    7.6207            33.699
> >>>>>
> >>>>> These numbers look a bit strange, especially the superlinear speedup
> >>>>> for assemble + solve. There might be a bug somewhere in the benchmark
> >>>>> code.
> >>>>>
> >>>>> Anyway, we have some preliminary results that at least show some kind
> >>>>> of speedup.
> >>>>>
> >>>>> It would be interesting to hear some comments on what kind of numbers
> >>>>> we should expect to get from Matt and others.
> >>>>>
> >>>>> The benchmark is for assembling and solving Poisson on a 64 x 64 x 64
> >>>>> mesh using PETSc/MUMPS. Partitioning time is not included in the
> >>>>> numbers.
> >>>> What solver is used when the number of processors is 1? If this is
> >>>> different from MUMPS, we will have the performance difference between the
> >>>> two solvers included in the speedup bump when going from 1 -> 2
> >>>> processors.
> >>> It's the default PETSc LU solver which should be UMFPACK.
> >>>
> >>> So one explanation could be that MUMPS is twice as fast as UMFPACK
> >>> (looking at the speedup for two processes), which means we should
> >>> divide the numbers by 2, giving a speedup of 17 instead of 34 which
> >>> would be more reasonable.
> >>>
> >>> The total speedup of 17 includes both assemble + solve. Since assemble
> >>> is obviously not scaling as it should, MUMPS may still be scaling
> >>> pretty good.
> >> We might add a second figure for the speedup measurement, which measures the 
> >> relative speedup for each doubling of the processors. Then we would get rid of 
> >> the MUMPS vs UMFPACK "bug" in the measurements.
> >>
> > 
> > Here are some benchmarks for LU solvers
> > 
> >     www.mis.mpg.de/preprints/2008/preprint2008_65.pdf
> > 
> >  From a quick look, for Poisson in 3D (Figure 6), MUMPS could well be 
> > more than a factor or two faster than UMFPACK.
> > 
> 
> For a 48x48x48 Poisson problem, MUMPS is 3.25 times faster than UMFPACK 
> on a single process. For smaller problems, UMFPACK can be faster.
> 
> Garth

Interesting. Perhaps MUMPS should be the default solver in serial if
it is available?

-- 
Anders


> > Garth
> > 
> >> Johan
> >>
> >>> So some preliminary conclusions are:
> >>>
> >>> 1. Something is not right with assembly.
> >>>
> >>> 2. MUMPS scales well and runs relatively faster than UMFPACK.
> >>>
> >>>
> >> _______________________________________________
> >> DOLFIN-dev mailing list
> >> DOLFIN-dev@xxxxxxxxxx
> >> http://www.fenics.org/mailman/listinfo/dolfin-dev
> > 
> > _______________________________________________
> > DOLFIN-dev mailing list
> > DOLFIN-dev@xxxxxxxxxx
> > http://www.fenics.org/mailman/listinfo/dolfin-dev
> 
> 
> _______________________________________________
> DOLFIN-dev mailing list
> DOLFIN-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/dolfin-dev

Attachment: signature.asc
Description: Digital signature


Follow ups

References