Thread Previous • Date Previous • Date Next • Thread Next |
Anders Logg wrote:
On Wed, Aug 06, 2008 at 01:44:24PM +0100, Garth N. Wells wrote:Anders Logg wrote:MTL4 isn't using a sparsity pattern. A guess is just being made as to the number of non-zeroes per row.On Wed, Aug 06, 2008 at 06:10:33AM -0500, Matthew Knepley wrote:On Wed, Aug 6, 2008 at 5:00 AM, Anders Logg <logg@xxxxxxxxx> wrote:On Wed, Aug 06, 2008 at 04:24:36AM -0500, Matthew Knepley wrote:---------- Forwarded message ---------- From: Matthew Knepley <knepley@xxxxxxxxx> Date: Wed, Aug 6, 2008 at 4:24 AM Subject: Re: [DOLFIN-dev] Assembly benchmark To: "Garth N. Wells" <gnw20@xxxxxxxxx> On Wed, Aug 6, 2008 at 4:20 AM, Garth N. Wells <gnw20@xxxxxxxxx> wrote:ok, here's the page, let's see some numbers: http://www.fenics.org/wiki/BenchmarkI just added my results. The most obvious difference in our systems is 32/64 bit which could likely account for the differences. MTL4 seems considerably faster on the 32 bit system.I need to understand the categories into which the time is divided: 1) They do not add to the total (or even close)There are 8 tables: 0 Assemble total 1 Init dof map 2 Build sparsity 3 Init tensor 4 Delete sparsity 5 Assemble cells 6 Overhead 7 Reassemble total The first is the total and includes 1-6 so tables 1-6 should add up to table 0. In fact, table 6 ("Overhead") is computed as the difference of table 0 and tables 1-5. Then table 7 reports the total for reassembling into a matrix which has already been initialized with the correct sparsity pattern (and used before). Maybe there's a better way to order/present the tables to make this clear?2) I am not sure what is going on within each unit1 Init dof map This one does some initialization for computing the dof map. The only thing that may happen here (for FFC forms) is that we may generate the edges and faces if those are needed. You can see the difference for P1, P2 and P3.Don't understand why this is different for any of the backends.It's the same, or should be. The benchmark just runs each test case once so there may be small "random" fluctuations in the numbers. The numbers of Table 1 are essentially the same for all backends.2 Build sparsity This one computes the sparsity pattern by iterating over all cells, computing the local-to-global mapping on each cell and counting the number of nonzeros.Same question.This should be the same for all backends except for Epetra. The DOLFIN LA interface allows for overloading the handling of the sparsity pattern. For Epetra, we use a Epetra_FECrsGraph to hold the sparsity pattern. It seems to perform worse than the DOLFIN built-in sparsity pattern (used for all other backends) which is just a simplestd::vector< std::set<uint> >It's now a std::vector< std::vector<uint> >which is faster than using std::set. Only uBLAS needs the terms to be ordered (std::set is ordered), so I added SparsityPattern::sort() to do this.GarthI didn't know. I'm surprised that doing a linear search is faster than using std::set. I thought the std::set was optimized for this.
std::set was dead slow, which I attributed to it being ordered and therefore requiring shuffling after insertions. I tried std::tr1::unordered_set, but it wasn't much better.
Garth
Thread Previous • Date Previous • Date Next • Thread Next |