← Back to team overview

dolfin team mailing list archive

Re: Fwd: Assembly benchmark

 

On Wed, Aug 06, 2008 at 01:44:24PM +0100, Garth N. Wells wrote:
> 
> 
> Anders Logg wrote:
> > On Wed, Aug 06, 2008 at 06:10:33AM -0500, Matthew Knepley wrote:
> >> On Wed, Aug 6, 2008 at 5:00 AM, Anders Logg <logg@xxxxxxxxx> wrote:
> >>> On Wed, Aug 06, 2008 at 04:24:36AM -0500, Matthew Knepley wrote:
> >>>> ---------- Forwarded message ----------
> >>>> From: Matthew Knepley <knepley@xxxxxxxxx>
> >>>> Date: Wed, Aug 6, 2008 at 4:24 AM
> >>>> Subject: Re: [DOLFIN-dev] Assembly benchmark
> >>>> To: "Garth N. Wells" <gnw20@xxxxxxxxx>
> >>>>
> >>>>
> >>>> On Wed, Aug 6, 2008 at 4:20 AM, Garth N. Wells <gnw20@xxxxxxxxx> wrote:
> >>>>>> ok, here's the page, let's see some numbers:
> >>>>>>
> >>>>>>   http://www.fenics.org/wiki/Benchmark
> >>>>>>
> >>>>> I just added my results.
> >>>>>
> >>>>> The most obvious difference in our systems is 32/64 bit which could
> >>>>> likely account for the differences. MTL4 seems considerably faster on
> >>>>> the 32 bit system.
> >>>> I need to understand the categories into which the time is divided:
> >>>>
> >>>>  1) They do not add to the total (or even close)
> >>> There are 8 tables:
> >>>
> >>>  0 Assemble total
> >>>  1 Init dof map
> >>>  2 Build sparsity
> >>>  3 Init tensor
> >>>  4 Delete sparsity
> >>>  5 Assemble cells
> >>>  6 Overhead
> >>>
> >>>  7 Reassemble total
> >>>
> >>> The first is the total and includes 1-6 so tables 1-6 should
> >>> add up to table 0. In fact, table 6 ("Overhead") is computed as the
> >>> difference of table 0 and tables 1-5.
> >>>
> >>> Then table 7 reports the total for reassembling into a matrix which
> >>> has already been initialized with the correct sparsity pattern (and
> >>> used before).
> >>>
> >>> Maybe there's a better way to order/present the tables to make this
> >>> clear?
> >>>
> >>>>  2) I am not sure what is going on within each unit
> >>>  1 Init dof map
> >>>
> >>> This one does some initialization for computing the dof map. The only
> >>> thing that may happen here (for FFC forms) is that we may generate
> >>> the edges and faces if those are needed. You can see the difference
> >>> for P1, P2 and P3.
> >> Don't understand why this is different for any of the backends.
> > 
> > It's the same, or should be. The benchmark just runs each test case
> > once so there may be small "random" fluctuations in the numbers.
> > 
> > The numbers of Table 1 are essentially the same for all backends.
> > 
> >>>  2 Build sparsity
> >>>
> >>> This one computes the sparsity pattern by iterating over all cells,
> >>> computing the local-to-global mapping on each cell and counting the
> >>> number of nonzeros.
> >> Same question.
> > 
> > This should be the same for all backends except for Epetra. The DOLFIN
> > LA interface allows for overloading the handling of the sparsity
> > pattern. For Epetra, we use a Epetra_FECrsGraph to hold the sparsity
> > pattern. It seems to perform worse than the DOLFIN built-in sparsity
> > pattern (used for all other backends) which is just a simple
> > 
> 
> MTL4 isn't using a sparsity pattern. A guess is just being made as to 
> the number of non-zeroes per row.
> 
> >   std::vector< std::set<uint> >
> > 
> 
> It's now a
> 
>      std::vector< std::vector<uint> >
> 
> which is faster than using std::set. Only uBLAS needs the terms to be 
> ordered (std::set is ordered), so I added SparsityPattern::sort() to do 
> this.
> 
> Garth

I didn't know. I'm surprised that doing a linear search is faster than
using std::set. I thought the std::set was optimized for this.

-- 
Anders


Follow ups

References