← Back to team overview

dolfin team mailing list archive

Re: Assembly timings

 


Improving the assembly is interesting, but it would probably be more
worthwile to get the results for Stokes. (It would also put FFC and
DOLFIN to the test and perhaps we can nail down some bugs in the
process.)

I agree that these are the results that are more likely of interest to the broader community, and we should push on them.

Another point is that to improve the speed of the assembly (which I
think is already pretty good), it would be easier to improve on the
interaction with the mesh (which we need to redo with Sieve anyway).


I claim that there is something also to be gained in batching together computations.
 Currently, for each element you
- get the affine map
- get the dof
- build the matrix
- insert it.

If you loop over N elements and get affine maps stored in a C array, then loop over N elements and get the dof in a C array, then build matrices for N arrays (this is level 3 blas instead of level 2), then insert N matrices, you're likely to improve performance.

Of course, I recall that the geometry and insertion into PETSc are the bottlenecks in the process? However, when you go to Navier- Stokes (or any trilinear operator), building the element matrix is much more expensive while insertion time stays the say (relative to Stokes, not Poisson, since there are more dof). You'll see the effects of level 3 BLAS much more in this regime. But this is not hard to try out and put into DOLFIN.


Rob Kirby

"Mathematical software should be mathematical."






Follow ups

References