dolfin team mailing list archive
-
dolfin team
-
Mailing list archive
-
Message #15645
Re: Results: Parallel speedup
On Thu, Sep 24, 2009 at 06:06:20PM +0100, Garth N. Wells wrote:
>
>
> Garth N. Wells wrote:
> >
> > Johan Hake wrote:
> >> On Monday 21 September 2009 22:46:29 Anders Logg wrote:
> >>> On Mon, Sep 21, 2009 at 09:44:11PM +0200, Johan Hake wrote:
> >>>> On Monday 21 September 2009 21:37:03 Anders Logg wrote:
> >>>>> Johan and I have set up a benchmark for parallel speedup in
> >>>>>
> >>>>> bench/fem/speedup
> >>>>>
> >>>>> Here are some preliminary results:
> >>>>>
> >>>>> Speedup | Assemble Assemble + solve
> >>>>> --------------------------------------
> >>>>> 1 | 1 1
> >>>>> 2 | 1.4351 4.0785
> >>>>> 4 | 2.3763 6.9076
> >>>>> 8 | 3.7458 9.4648
> >>>>> 16 | 6.3143 19.369
> >>>>> 32 | 7.6207 33.699
> >>>>>
> >>>>> These numbers look a bit strange, especially the superlinear speedup
> >>>>> for assemble + solve. There might be a bug somewhere in the benchmark
> >>>>> code.
> >>>>>
> >>>>> Anyway, we have some preliminary results that at least show some kind
> >>>>> of speedup.
> >>>>>
> >>>>> It would be interesting to hear some comments on what kind of numbers
> >>>>> we should expect to get from Matt and others.
> >>>>>
> >>>>> The benchmark is for assembling and solving Poisson on a 64 x 64 x 64
> >>>>> mesh using PETSc/MUMPS. Partitioning time is not included in the
> >>>>> numbers.
> >>>> What solver is used when the number of processors is 1? If this is
> >>>> different from MUMPS, we will have the performance difference between the
> >>>> two solvers included in the speedup bump when going from 1 -> 2
> >>>> processors.
> >>> It's the default PETSc LU solver which should be UMFPACK.
> >>>
> >>> So one explanation could be that MUMPS is twice as fast as UMFPACK
> >>> (looking at the speedup for two processes), which means we should
> >>> divide the numbers by 2, giving a speedup of 17 instead of 34 which
> >>> would be more reasonable.
> >>>
> >>> The total speedup of 17 includes both assemble + solve. Since assemble
> >>> is obviously not scaling as it should, MUMPS may still be scaling
> >>> pretty good.
> >> We might add a second figure for the speedup measurement, which measures the
> >> relative speedup for each doubling of the processors. Then we would get rid of
> >> the MUMPS vs UMFPACK "bug" in the measurements.
> >>
> >
> > Here are some benchmarks for LU solvers
> >
> > www.mis.mpg.de/preprints/2008/preprint2008_65.pdf
> >
> > From a quick look, for Poisson in 3D (Figure 6), MUMPS could well be
> > more than a factor or two faster than UMFPACK.
> >
>
> For a 48x48x48 Poisson problem, MUMPS is 3.25 times faster than UMFPACK
> on a single process. For smaller problems, UMFPACK can be faster.
>
> Garth
Interesting. Perhaps MUMPS should be the default solver in serial if
it is available?
--
Anders
> > Garth
> >
> >> Johan
> >>
> >>> So some preliminary conclusions are:
> >>>
> >>> 1. Something is not right with assembly.
> >>>
> >>> 2. MUMPS scales well and runs relatively faster than UMFPACK.
> >>>
> >>>
> >> _______________________________________________
> >> DOLFIN-dev mailing list
> >> DOLFIN-dev@xxxxxxxxxx
> >> http://www.fenics.org/mailman/listinfo/dolfin-dev
> >
> > _______________________________________________
> > DOLFIN-dev mailing list
> > DOLFIN-dev@xxxxxxxxxx
> > http://www.fenics.org/mailman/listinfo/dolfin-dev
>
>
> _______________________________________________
> DOLFIN-dev mailing list
> DOLFIN-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/dolfin-dev
Attachment:
signature.asc
Description: Digital signature
Follow ups
References