← Back to team overview

dolfin team mailing list archive

Re: Assembly benchmark


Matthew Knepley wrote:
On Mon, Jul 21, 2008 at 3:50 PM, Garth N. Wells <gnw20@xxxxxxxxx> wrote:

Anders Logg wrote:
On Mon, Jul 21, 2008 at 01:48:23PM +0100, Garth N. Wells wrote:
Anders Logg wrote:
I have updated the assembly benchmark to include also MTL4, see


Here are the current results:

Assembly benchmark  |  Elasticity3D  PoissonP1  PoissonP2  PoissonP3  THStokes2D  NSEMomentum3D  StabStokes2D
uBLAS               |        9.0789    0.45645     3.8042     8.0736  14.937         9.2507        3.8455
PETSc               |        7.7758    0.42798     3.5483     7.3898  13.945         8.1632         3.258
Epetra              |        8.9516    0.45448     3.7976     8.0679  15.404         9.2341        3.8332
MTL4                |        8.9729    0.45554     3.7966     8.0759  14.94          9.2568        3.8658
Assembly            |         7.474    0.43673     3.7341     8.3793  14.633         7.6695        3.3878

I specified in MTL4Matrix maximum 30 nonzeroes per row, and the results
change quite a bit,

 Assembly benchmark  |  Elasticity3D  PoissonP1  PoissonP2  PoissonP3
THStokes2D  NSEMomentum3D  StabStokes2D

 uBLAS               |        7.1881    0.32748     2.7633     5.8311
    10.968         7.0735        2.8184
 PETSc               |        5.7868    0.30673     2.5489     5.2344
    9.8896          6.069        2.3661
 MTL4                |        2.8641    0.18339     1.6628     2.6811
    2.8519         3.4843       0.85029
 Assembly            |        5.5564    0.30896     2.6858     5.9675
    10.622         5.7144        2.4519

MTL4 is a lot faster in all cases.

Now I don't believe the numbers. If you preallocate, we do not do any
extra processing
outside of sorting the column indices (which every format must do for
efficient operations).
Thus, how would you save any time? If these are all in seconds, I will
run a 2D Poisson here
and tell you what I get. It would help to specify sizes with this benchmark :)

Take a look at bench/fem/assembly/ for the details.




How was the MTL4 matrix intialised? I don't know if it does anything
with the sparsity pattern yet. I've been intialising MTL4 matrices by
hand so far with a guess as to the max number of nonzeroes per row.
Without setting this, the performance is near idenetical to uBLAS. When
it is set, I observe at least a factor two speed up.

The same way as all other backends, which is by a precomputed
sparsity pattern. It looks like this is currently ignored in the
MTL4Matrix implementation:

void MTL4Matrix::init(const GenericSparsityPattern& sparsity_pattern)
  init(sparsity_pattern.size(0), sparsity_pattern.size(1));


DOLFIN-dev mailing list

DOLFIN-dev mailing list

Follow ups
