← Back to team overview

dolfin team mailing list archive

Re: Fwd: Assembly benchmark

 

On Wed, Aug 6, 2008 at 5:00 AM, Anders Logg <logg@xxxxxxxxx> wrote:
> On Wed, Aug 06, 2008 at 04:24:36AM -0500, Matthew Knepley wrote:
>> ---------- Forwarded message ----------
>> From: Matthew Knepley <knepley@xxxxxxxxx>
>> Date: Wed, Aug 6, 2008 at 4:24 AM
>> Subject: Re: [DOLFIN-dev] Assembly benchmark
>> To: "Garth N. Wells" <gnw20@xxxxxxxxx>
>>
>>
>> On Wed, Aug 6, 2008 at 4:20 AM, Garth N. Wells <gnw20@xxxxxxxxx> wrote:
>> >> ok, here's the page, let's see some numbers:
>> >>
>> >>   http://www.fenics.org/wiki/Benchmark
>> >>
>> >
>> > I just added my results.
>> >
>> > The most obvious difference in our systems is 32/64 bit which could
>> > likely account for the differences. MTL4 seems considerably faster on
>> > the 32 bit system.
>>
>> I need to understand the categories into which the time is divided:
>>
>>  1) They do not add to the total (or even close)
>
> There are 8 tables:
>
>  0 Assemble total
>  1 Init dof map
>  2 Build sparsity
>  3 Init tensor
>  4 Delete sparsity
>  5 Assemble cells
>  6 Overhead
>
>  7 Reassemble total
>
> The first is the total and includes 1-6 so tables 1-6 should
> add up to table 0. In fact, table 6 ("Overhead") is computed as the
> difference of table 0 and tables 1-5.
>
> Then table 7 reports the total for reassembling into a matrix which
> has already been initialized with the correct sparsity pattern (and
> used before).
>
> Maybe there's a better way to order/present the tables to make this
> clear?
>
>>  2) I am not sure what is going on within each unit
>
>  1 Init dof map
>
> This one does some initialization for computing the dof map. The only
> thing that may happen here (for FFC forms) is that we may generate
> the edges and faces if those are needed. You can see the difference
> for P1, P2 and P3.

Don't understand why this is different for any of the backends.

>  2 Build sparsity
>
> This one computes the sparsity pattern by iterating over all cells,
> computing the local-to-global mapping on each cell and counting the
> number of nonzeros.

Same question.

>  3 Init tensor
>
> This one initializes the matrix from the sparsity pattern by looking
> at the number of nonzeros per row (calling MatSeqAIJSetPreallocation)
> in PETSc.

Okay.

>  4 Delete sparsity
>
> This one deletes the sparsity pattern. This shouldn't take any time
> but we found in some tests it actually does (due to some STL
> peculiarities).

This is nonzero for some PETSc runs, which makes no sense.

>  5 Assemble cells
>
> This one does the actual assembly loop over cells and inserts
> (MatSetValues in PETSc).

It would be nice to time calculation vs. insertion time.

>  6. Overhead
>
> Everything else not specifically accounted for.
>
>>  3) This is still much more expensive than my PETSc example (which can be
>>      easily run. Its ex2 in KSP).
>
> Do we use the same mesh? In 2D it's a 256x256 unit square and in 3D
> it's a 32x32x32 unit cube.

Okay, I will switch to this.

   Matt

>> Thus it is hard for me to be convinced that something underneath is just not
>> preventing fast operation. Furthermore, this is not checked against a
>> performance
>> model, say plotted against the number of cells.
>
> I agree that would be good to have as well, but there's also a point
> in keeping the benchmark small (so it's fast to run and compare).
>
> --
> Anders
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFImXZNTuwUCDsYZdERAqIEAKCWSFcyKc5nHLrpAxCycvsEuhhBIwCcDdqj
> FxipJLz0IMKG8WAOt6kCVoo=
> =a7S8
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> DOLFIN-dev mailing list
> DOLFIN-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/dolfin-dev
>
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


Follow ups

References