← Back to team overview

dolfin team mailing list archive

Re: profiling an assembly

 

On Sun, May 18, 2008 at 10:55:10PM +0200, Johan Hoffman wrote:
> > On Sun 2008-05-18 21:54, Johan Hoffman wrote:
> >> > On Sat, May 17, 2008 at 04:40:48PM +0200, Johan Hoffman wrote:
> >> >
> >> > 1. Solve time may dominate assemble anyway so that's where we should
> >> > optimize.
> >>
> >> Yes, there may be such cases, in particular for simple forms (Laplace
> >> equation etc.). For more complex forms with more terms and coefficients,
> >> assembly typically dominates, from what I have seen. This is the case
> >> for
> >> the flow problems of Murtazo for example.
> >
> > This probably depends if you use are using a projection method.  If you
> > are
> > solving the saddle point problem, you can forget about assembly time.
> 
> Well, this is not what we see. I agree that this is what you would like,
> but this is not the case now. That is why we are now focusing on the
> assembly bottleneck.
> 
> But
> > optimizing the solve is all about constructing a good preconditioner.  If
> > the
> > operator is elliptic then AMG should work well and you don't have to
> > think, but
> > if it is indefinite all bets are off.  I think we can build saddle point
> > preconditioners now by writing some funny-looking mixed form files, but
> > that
> > could be made easier.
> 
> We use a splitting approach with GMRES for the momentum equation and AMG
> for the continuity equations. This appears to work faitly well. As I said,
> the assembly of the momentum equation is dominating.
> 
> >
> >> > 2. Assembling the action instead of the operator removes the A.add()
> >> > bottleneck.
> >>
> >> True. But it may be worthwhile to put some effort into optimizing also
> >> the
> >> matrix assembly.
> >
> > In any case, you have to form something to precondition with.
> >
> >> > As mentioned before, we are experimenting with iterating locally over
> >> > cells sharing common dofs and combining batches of element tensors
> >> > before inserting into the global sparse matrix row by row. Let's see
> >> > how it goes.
> >>
> >> Yes, this is interesting. Would be very interesting to hear about the
> >> progress.
> >>
> >> It is also interesting to understand what would optimize the insertion
> >> for
> >> different linear algebra backends, in particular Jed seems to have a
> >> good
> >> knowledge on petsc. We could then build backend optimimization into the
> >> local dof-orderings etc.
> >
> > I just press M-. when I'm curious :-)
> >
> > I can't imagine it pays to optimize for a particular backend (it's not
> > PETSc
> > anyway, rather whichever format is used by the preconditioner).  The CSR
> > data
> > structure is pretty common, but it will always be fastest to insert an
> > entire
> > row at once.  If using an intermediate hashed structure makes this
> > convenient,
> > then it would help.  The paper I posted assembles the entire matrix in
> > hashed
> > format and then converts it to CSR.  I'll guess that a hashed cache for
> > the
> > assembly (flushed every few MiB, for instance) would work at least as well
> > as
> > assembling the entire thing in hashed format.
> 
> Yes, it seems that some form of hashed structure is a good possibility to
> optimize. What Murtazo is referring to would be similar to hash the whole
> matrix as in the paper you posted,

The way I interpret it, they are very different. The hash would store
a mapping from (i, j) to values while Murtazo suggest storing a
mapping from (element, i, j) to values.

-- 
Anders


> and at Simula work on the row hashed
> structures appears to be under way.
> 
> It will be interesting to see the results.
> 
> /Johan
> 
> >
> > Jed
> > _______________________________________________
> > DOLFIN-dev mailing list
> > DOLFIN-dev@xxxxxxxxxx
> > http://www.fenics.org/mailman/listinfo/dolfin-dev
> >
> 
> 
> _______________________________________________
> DOLFIN-dev mailing list
> DOLFIN-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/dolfin-dev


Follow ups

References