dolfin team mailing list archive

Thread
Date

Re: Fwd: Assembly benchmark

To: dolfin-dev@xxxxxxxxxx
From: "Garth N. Wells" <gnw20@xxxxxxxxx>
Date: Wed, 06 Aug 2008 13:44:24 +0100
Delivered-to: dolfin-dev@xxxxxxxxxx
In-reply-to: <20080806120231.GC6889@simula.no>
User-agent: Thunderbird 2.0.0.16 (X11/20080724)



Anders Logg wrote:

On Wed, Aug 06, 2008 at 06:10:33AM -0500, Matthew Knepley wrote:

On Wed, Aug 6, 2008 at 5:00 AM, Anders Logg <logg@xxxxxxxxx> wrote:

On Wed, Aug 06, 2008 at 04:24:36AM -0500, Matthew Knepley wrote:

---------- Forwarded message ----------
From: Matthew Knepley <knepley@xxxxxxxxx>
Date: Wed, Aug 6, 2008 at 4:24 AM
Subject: Re: [DOLFIN-dev] Assembly benchmark
To: "Garth N. Wells" <gnw20@xxxxxxxxx>


On Wed, Aug 6, 2008 at 4:20 AM, Garth N. Wells <gnw20@xxxxxxxxx> wrote:

ok, here's the page, let's see some numbers:

  http://www.fenics.org/wiki/Benchmark

I just added my results.

The most obvious difference in our systems is 32/64 bit which could
likely account for the differences. MTL4 seems considerably faster on
the 32 bit system.

I need to understand the categories into which the time is divided:

 1) They do not add to the total (or even close)

There are 8 tables:

 0 Assemble total
 1 Init dof map
 2 Build sparsity
 3 Init tensor
 4 Delete sparsity
 5 Assemble cells
 6 Overhead

 7 Reassemble total

The first is the total and includes 1-6 so tables 1-6 should
add up to table 0. In fact, table 6 ("Overhead") is computed as the
difference of table 0 and tables 1-5.

Then table 7 reports the total for reassembling into a matrix which
has already been initialized with the correct sparsity pattern (and
used before).

Maybe there's a better way to order/present the tables to make this
clear?

 2) I am not sure what is going on within each unit

 1 Init dof map

This one does some initialization for computing the dof map. The only
thing that may happen here (for FFC forms) is that we may generate
the edges and faces if those are needed. You can see the difference
for P1, P2 and P3.

Don't understand why this is different for any of the backends.


It's the same, or should be. The benchmark just runs each test case
once so there may be small "random" fluctuations in the numbers.

The numbers of Table 1 are essentially the same for all backends.

 2 Build sparsity

This one computes the sparsity pattern by iterating over all cells,
computing the local-to-global mapping on each cell and counting the
number of nonzeros.

Same question.


This should be the same for all backends except for Epetra. The DOLFIN
LA interface allows for overloading the handling of the sparsity
pattern. For Epetra, we use a Epetra_FECrsGraph to hold the sparsity
pattern. It seems to perform worse than the DOLFIN built-in sparsity
pattern (used for all other backends) which is just a simple

MTL4 isn't using a sparsity pattern. A guess is just being made as tothe number of non-zeroes per row.

  std::vector< std::set<uint> >


It's now a

    std::vector< std::vector<uint> >

which is faster than using std::set. Only uBLAS needs the terms to beordered (std::set is ordered), so I added SparsityPattern::sort() to dothis.


Garth

 3 Init tensor

This one initializes the matrix from the sparsity pattern by looking
at the number of nonzeros per row (calling MatSeqAIJSetPreallocation)
in PETSc.

Okay.

 4 Delete sparsity

This one deletes the sparsity pattern. This shouldn't take any time
but we found in some tests it actually does (due to some STL
peculiarities).

This is nonzero for some PETSc runs, which makes no sense.


The same data structure (the STL vector of sets) is used for all
backends (including PETSc but not Epetra) so this will show up for
PETSc.

 5 Assemble cells

This one does the actual assembly loop over cells and inserts
(MatSetValues in PETSc).

It would be nice to time calculation vs. insertion time.


I'll see if I can insert it. I'm a little worried it will hurt
performance. All other timings are global and this would have to be
done inside the loop.

 6. Overhead

Everything else not specifically accounted for.

 3) This is still much more expensive than my PETSc example (which can be
     easily run. Its ex2 in KSP).

Do we use the same mesh? In 2D it's a 256x256 unit square and in 3D
it's a 32x32x32 unit cube.

Okay, I will switch to this.


Nice.



------------------------------------------------------------------------

_______________________________________________
DOLFIN-dev mailing list
DOLFIN-dev@xxxxxxxxxx
http://www.fenics.org/mailman/listinfo/dolfin-dev

Follow ups

Re: Fwd: Assembly benchmark
From: Anders Logg, 2008-08-06

References

Assembly benchmark
From: Anders Logg, 2008-08-05
Re: Assembly benchmark
From: Garth N. Wells, 2008-08-05
Re: Assembly benchmark
From: Anders Logg, 2008-08-05
Re: Assembly benchmark
From: Anders Logg, 2008-08-06
Re: Assembly benchmark
From: Garth N. Wells, 2008-08-06
Fwd: Assembly benchmark
From: Matthew Knepley, 2008-08-06
Re: Fwd: Assembly benchmark
From: Anders Logg, 2008-08-06
Re: Fwd: Assembly benchmark
From: Matthew Knepley, 2008-08-06
Re: Fwd: Assembly benchmark
From: Anders Logg, 2008-08-06