dolfin team mailing list archive
-
dolfin team
-
Mailing list archive
-
Message #01035
Re: Assembly timings
Improving the assembly is interesting, but it would probably be more
worthwile to get the results for Stokes. (It would also put FFC and
DOLFIN to the test and perhaps we can nail down some bugs in the
process.)
I agree that these are the results that are more likely of interest
to the broader community, and we should push on them.
Another point is that to improve the speed of the assembly (which I
think is already pretty good), it would be easier to improve on the
interaction with the mesh (which we need to redo with Sieve anyway).
I claim that there is something also to be gained in batching
together computations.
Currently, for each element you
- get the affine map
- get the dof
- build the matrix
- insert it.
If you loop over N elements and get affine maps stored in a C array,
then loop over N elements and get the dof in a C array, then build
matrices for N arrays (this is level 3 blas instead of level 2), then
insert N matrices, you're likely to improve performance.
Of course, I recall that the geometry and insertion into PETSc are
the bottlenecks in the process? However, when you go to Navier-
Stokes (or any trilinear operator), building the element matrix is
much more expensive while insertion time stays the say (relative to
Stokes, not Poisson, since there are more dof). You'll see the
effects of level 3 BLAS much more in this regime. But this is not
hard to try out and put into DOLFIN.
Rob Kirby
"Mathematical software should be mathematical."
Follow ups
References