dolfin team mailing list archive
-
dolfin team
-
Mailing list archive
-
Message #15573
Re: Results: Parallel speedup
On Tue, Sep 22, 2009 at 08:11:27AM +0200, Niclas Jansson wrote:
> Matthew Knepley <knepley@xxxxxxxxx> writes:
>
> > On Mon, Sep 21, 2009 at 2:37 PM, Anders Logg <logg@xxxxxxxxx> wrote:
> >
> > Johan and I have set up a benchmark for parallel speedup in
> >
> > bench/fem/speedup
> >
> > Here are some preliminary results:
> >
> > Speedup | Assemble Assemble + solve
> > --------------------------------------
> > 1 | 1 1
> > 2 | 1.4351 4.0785
> > 4 | 2.3763 6.9076
> > 8 | 3.7458 9.4648
> > 16 | 6.3143 19.369
> > 32 | 7.6207 33.699
> >
> > These numbers are very very strange for a number of reasons:
> >
> > 1) Assemble should scale almost perfectly. Something is wrong here.
> >
> > 2) Solve should scale like a matvec, which should not be this good,
> > especially on a cluster with a slow network. I would expect 85% or so.
> >
> > 3) If any of these are dual core, then it really does not make sense since
> > it should be bandwidth limited.
> >
> > Matt
> >
>
> So true, these numbers are very strange. I usually get 6-7 times speedup
> for the icns solver in unicorn on a crappy intel bus based 2 x quad core.
>
> A quick look at the code, is the mesh only 64 x 64? This could (does) explain
> the poor assembly performance on 32 processes (^-^)
It's 64 x 64 x 64 (3D). What would be a reasonable size?
> Also, I think the timing is done in the wrong way. Without barriers, it
> would never measure the true parallel runtime.
>
> MPI_Barrier
> MPI_Wtime
> number crunching
> MPI_Barrier
> MPI_Wtime
>
> (Well assemble is more or less an implicit barrier due to apply(), but I
> don't think solvers has some kind of implicit barriers)
I thought there were implicit barriers in both assemble (apply) and
the solver, but adding barriers would not hurt.
--
Anders
Attachment:
signature.asc
Description: Digital signature
Follow ups
References