dolfin team mailing list archive
-
dolfin team
-
Mailing list archive
-
Message #15753
Re: Results: Parallel speedup
On Sat, Sep 26, 2009 at 08:12:47AM +0100, Garth N. Wells wrote:
>
>
> Anders Logg wrote:
> > On Thu, Sep 24, 2009 at 06:06:20PM +0100, Garth N. Wells wrote:
> >>
> >> Garth N. Wells wrote:
> >>> Johan Hake wrote:
> >>>> On Monday 21 September 2009 22:46:29 Anders Logg wrote:
> >>>>> On Mon, Sep 21, 2009 at 09:44:11PM +0200, Johan Hake wrote:
> >>>>>> On Monday 21 September 2009 21:37:03 Anders Logg wrote:
> >>>>>>> Johan and I have set up a benchmark for parallel speedup in
> >>>>>>>
> >>>>>>> bench/fem/speedup
> >>>>>>>
> >>>>>>> Here are some preliminary results:
> >>>>>>>
> >>>>>>> Speedup | Assemble Assemble + solve
> >>>>>>> --------------------------------------
> >>>>>>> 1 | 1 1
> >>>>>>> 2 | 1.4351 4.0785
> >>>>>>> 4 | 2.3763 6.9076
> >>>>>>> 8 | 3.7458 9.4648
> >>>>>>> 16 | 6.3143 19.369
> >>>>>>> 32 | 7.6207 33.699
> >>>>>>>
> >>>>>>> These numbers look a bit strange, especially the superlinear speedup
> >>>>>>> for assemble + solve. There might be a bug somewhere in the benchmark
> >>>>>>> code.
> >>>>>>>
> >>>>>>> Anyway, we have some preliminary results that at least show some kind
> >>>>>>> of speedup.
> >>>>>>>
> >>>>>>> It would be interesting to hear some comments on what kind of numbers
> >>>>>>> we should expect to get from Matt and others.
> >>>>>>>
> >>>>>>> The benchmark is for assembling and solving Poisson on a 64 x 64 x 64
> >>>>>>> mesh using PETSc/MUMPS. Partitioning time is not included in the
> >>>>>>> numbers.
> >>>>>> What solver is used when the number of processors is 1? If this is
> >>>>>> different from MUMPS, we will have the performance difference between the
> >>>>>> two solvers included in the speedup bump when going from 1 -> 2
> >>>>>> processors.
> >>>>> It's the default PETSc LU solver which should be UMFPACK.
> >>>>>
> >>>>> So one explanation could be that MUMPS is twice as fast as UMFPACK
> >>>>> (looking at the speedup for two processes), which means we should
> >>>>> divide the numbers by 2, giving a speedup of 17 instead of 34 which
> >>>>> would be more reasonable.
> >>>>>
> >>>>> The total speedup of 17 includes both assemble + solve. Since assemble
> >>>>> is obviously not scaling as it should, MUMPS may still be scaling
> >>>>> pretty good.
> >>>> We might add a second figure for the speedup measurement, which measures the
> >>>> relative speedup for each doubling of the processors. Then we would get rid of
> >>>> the MUMPS vs UMFPACK "bug" in the measurements.
> >>>>
> >>> Here are some benchmarks for LU solvers
> >>>
> >>> www.mis.mpg.de/preprints/2008/preprint2008_65.pdf
> >>>
> >>> From a quick look, for Poisson in 3D (Figure 6), MUMPS could well be
> >>> more than a factor or two faster than UMFPACK.
> >>>
> >> For a 48x48x48 Poisson problem, MUMPS is 3.25 times faster than UMFPACK
> >> on a single process. For smaller problems, UMFPACK can be faster.
> >>
> >> Garth
> >
> > Interesting. Perhaps MUMPS should be the default solver in serial if
> > it is available?
> >
>
> Maybe, although it is generally slower for smaller systems. We should at
> least make it straightforward to choose between the two.
I agree, there should be a parameter to control this.
In the meantime, I've changed the default to MUMPS even in serial if
it is available. If someone has MUMPS installed, it makes sense for
that to be the default option. If on the other hand you mostly want to
solve small problems, you probably don't have MUMPS and will then get
UMFPACK.
--
Anders
Attachment:
signature.asc
Description: Digital signature
Follow ups
References