dolfin team mailing list archive

Thread
Date

Re: Results: Parallel speedup

To: dolfin-dev@xxxxxxxxxx
From: Anders Logg <logg@xxxxxxxxx>
Date: Tue, 22 Sep 2009 08:18:03 +0200
Delivered-to: dolfin-dev@xxxxxxxxxx
In-reply-to: <jmrmy4nik4w.fsf@na53.nada.kth.se>
Mail-followup-to: dolfin-dev@xxxxxxxxxx
User-agent: Mutt/1.5.20 (2009-06-14)

On Tue, Sep 22, 2009 at 08:11:27AM +0200, Niclas Jansson wrote:
> Matthew Knepley <knepley@xxxxxxxxx> writes:
>
> > On Mon, Sep 21, 2009 at 2:37 PM, Anders Logg <logg@xxxxxxxxx> wrote:
> >
> >     Johan and I have set up a benchmark for parallel speedup in
> >
> >      bench/fem/speedup
> >
> >     Here are some preliminary results:
> >
> >      Speedup  |  Assemble  Assemble + solve
> >      --------------------------------------
> >      1        |         1                 1
> >      2        |    1.4351            4.0785
> >      4        |    2.3763            6.9076
> >      8        |    3.7458            9.4648
> >      16       |    6.3143            19.369
> >      32       |    7.6207            33.699
> >
> > These numbers are very very strange for a number of reasons:
> >
> > 1) Assemble should scale almost perfectly. Something is wrong here.
> >
> > 2) Solve should scale like a matvec, which should not be this good,
> >     especially on a cluster with a slow network. I would expect 85% or so.
> >
> > 3) If any of these are dual core, then it really does not make sense since
> >     it should be bandwidth limited.
> >
> >   Matt
> >  
>
> So true, these numbers are very strange. I usually get 6-7 times speedup
> for the icns solver in unicorn on a crappy intel bus based 2 x quad core.
>
> A quick look at the code, is the mesh only 64 x 64? This could (does) explain
> the poor assembly performance on 32 processes (^-^)

It's 64 x 64 x 64 (3D). What would be a reasonable size?

> Also, I think the timing is done in the wrong way. Without barriers, it
> would never measure the true parallel runtime.
>
> MPI_Barrier
> MPI_Wtime
> number crunching
> MPI_Barrier
> MPI_Wtime
>
> (Well assemble is more or less an implicit barrier due to apply(), but I
> don't think solvers has some kind of implicit barriers)

I thought there were implicit barriers in both assemble (apply) and
the solver, but adding barriers would not hurt.

--
Anders

Attachment: signature.asc
Description: Digital signature

Follow ups

Re: Results: Parallel speedup
From: Garth N. Wells, 2009-09-22
Re: Results: Parallel speedup
From: Niclas Jansson, 2009-09-22

References

Results: Parallel speedup
From: Anders Logg, 2009-09-21
Re: Results: Parallel speedup
From: Matthew Knepley, 2009-09-21
Re: Results: Parallel speedup
From: Niclas Jansson, 2009-09-22