dolfin team mailing list archive

Thread
Date

Re: Results: Parallel speedup

To: dolfin-dev@xxxxxxxxxx
From: Niclas Jansson <njansson@xxxxxx>
Date: Tue, 22 Sep 2009 08:11:27 +0200
Cc:
Delivered-to: dolfin-dev@xxxxxxxxxx
In-reply-to: <a9f269830909211240q2a29b962ne93a37387cceab83@mail.gmail.com> (Matthew Knepley's message of "Mon\, 21 Sep 2009 14\:40\:45 -0500")
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux)

Matthew Knepley <knepley@xxxxxxxxx> writes:

> On Mon, Sep 21, 2009 at 2:37 PM, Anders Logg <logg@xxxxxxxxx> wrote:
>
>     Johan and I have set up a benchmark for parallel speedup in
>    
>      bench/fem/speedup
>    
>     Here are some preliminary results:
>    
>      Speedup  |  Assemble  Assemble + solve
>      --------------------------------------
>      1        |         1                 1
>      2        |    1.4351            4.0785
>      4        |    2.3763            6.9076
>      8        |    3.7458            9.4648
>      16       |    6.3143            19.369
>      32       |    7.6207            33.699
>
> These numbers are very very strange for a number of reasons:
>
> 1) Assemble should scale almost perfectly. Something is wrong here.
>
> 2) Solve should scale like a matvec, which should not be this good,
>     especially on a cluster with a slow network. I would expect 85% or so.
>
> 3) If any of these are dual core, then it really does not make sense since
>     it should be bandwidth limited.
>
>   Matt
>  

So true, these numbers are very strange. I usually get 6-7 times speedup
for the icns solver in unicorn on a crappy intel bus based 2 x quad core.

A quick look at the code, is the mesh only 64 x 64? This could (does) explain
the poor assembly performance on 32 processes (^-^)

Also, I think the timing is done in the wrong way. Without barriers, it
would never measure the true parallel runtime.

MPI_Barrier
MPI_Wtime
number crunching
MPI_Barrier
MPI_Wtime

(Well assemble is more or less an implicit barrier due to apply(), but I
don't think solvers has some kind of implicit barriers)

Niclas

>
>     These numbers look a bit strange, especially the superlinear speedup
>     for assemble + solve. There might be a bug somewhere in the benchmark
>     code.
>    
>     Anyway, we have some preliminary results that at least show some kind
>     of speedup.
>    
>     It would be interesting to hear some comments on what kind of numbers
>     we should expect to get from Matt and others.
>    
>     The benchmark is for assembling and solving Poisson on a 64 x 64 x 64
>     mesh using PETSc/MUMPS. Partitioning time is not included in the
>     numbers.
>    
>     --
>     Anders
>    
>     -----BEGIN PGP SIGNATURE-----
>     Version: GnuPG v1.4.9 (GNU/Linux)
>    
>     iEYEARECAAYFAkq31d4ACgkQTuwUCDsYZdHRKgCaAlc3XbJF18kBYnZ6kYztjKyG
>     KFAAnRg38+SNMSAdAf5fOm3QZDTTyP97
>     =DgXw
>     -----END PGP SIGNATURE-----
>    
>     _______________________________________________
>     DOLFIN-dev mailing list
>     DOLFIN-dev@xxxxxxxxxx
>     http://www.fenics.org/mailman/listinfo/dolfin-dev
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any
> results to which their experiments lead.
> -- Norbert Wiener
>
> _______________________________________________
> DOLFIN-dev mailing list
> DOLFIN-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/dolfin-dev

Follow ups

Re: Results: Parallel speedup
From: Anders Logg, 2009-09-22

References

Results: Parallel speedup
From: Anders Logg, 2009-09-21
Re: Results: Parallel speedup
From: Matthew Knepley, 2009-09-21