← Back to team overview

dolfin team mailing list archive

Re: profiling an assembly

 

On Sun 2008-05-18 21:54, Johan Hoffman wrote:
> > On Sat, May 17, 2008 at 04:40:48PM +0200, Johan Hoffman wrote:
> >
> > 1. Solve time may dominate assemble anyway so that's where we should
> > optimize.
> 
> Yes, there may be such cases, in particular for simple forms (Laplace
> equation etc.). For more complex forms with more terms and coefficients,
> assembly typically dominates, from what I have seen. This is the case for
> the flow problems of Murtazo for example.

This probably depends if you use are using a projection method.  If you are
solving the saddle point problem, you can forget about assembly time.  But
optimizing the solve is all about constructing a good preconditioner.  If the
operator is elliptic then AMG should work well and you don't have to think, but
if it is indefinite all bets are off.  I think we can build saddle point
preconditioners now by writing some funny-looking mixed form files, but that
could be made easier.

> > 2. Assembling the action instead of the operator removes the A.add()
> > bottleneck.
> 
> True. But it may be worthwhile to put some effort into optimizing also the
> matrix assembly.

In any case, you have to form something to precondition with.

> > As mentioned before, we are experimenting with iterating locally over
> > cells sharing common dofs and combining batches of element tensors
> > before inserting into the global sparse matrix row by row. Let's see
> > how it goes.
> 
> Yes, this is interesting. Would be very interesting to hear about the
> progress.
> 
> It is also interesting to understand what would optimize the insertion for
> different linear algebra backends, in particular Jed seems to have a good
> knowledge on petsc. We could then build backend optimimization into the
> local dof-orderings etc.

I just press M-. when I'm curious :-)

I can't imagine it pays to optimize for a particular backend (it's not PETSc
anyway, rather whichever format is used by the preconditioner).  The CSR data
structure is pretty common, but it will always be fastest to insert an entire
row at once.  If using an intermediate hashed structure makes this convenient,
then it would help.  The paper I posted assembles the entire matrix in hashed
format and then converts it to CSR.  I'll guess that a hashed cache for the
assembly (flushed every few MiB, for instance) would work at least as well as
assembling the entire thing in hashed format.

Jed

Attachment: pgprw35FXW3zD.pgp
Description: PGP signature


Follow ups

References