← Back to team overview

dolfin team mailing list archive

Re: profiling an assembly

 

> On Sun 2008-05-18 21:54, Johan Hoffman wrote:
>> > On Sat, May 17, 2008 at 04:40:48PM +0200, Johan Hoffman wrote:
>> >
>> > 1. Solve time may dominate assemble anyway so that's where we should
>> > optimize.
>>
>> Yes, there may be such cases, in particular for simple forms (Laplace
>> equation etc.). For more complex forms with more terms and coefficients,
>> assembly typically dominates, from what I have seen. This is the case
>> for
>> the flow problems of Murtazo for example.
>
> This probably depends if you use are using a projection method.  If you
> are
> solving the saddle point problem, you can forget about assembly time.

Well, this is not what we see. I agree that this is what you would like,
but this is not the case now. That is why we are now focusing on the
assembly bottleneck.

But
> optimizing the solve is all about constructing a good preconditioner.  If
> the
> operator is elliptic then AMG should work well and you don't have to
> think, but
> if it is indefinite all bets are off.  I think we can build saddle point
> preconditioners now by writing some funny-looking mixed form files, but
> that
> could be made easier.

We use a splitting approach with GMRES for the momentum equation and AMG
for the continuity equations. This appears to work faitly well. As I said,
the assembly of the momentum equation is dominating.

>
>> > 2. Assembling the action instead of the operator removes the A.add()
>> > bottleneck.
>>
>> True. But it may be worthwhile to put some effort into optimizing also
>> the
>> matrix assembly.
>
> In any case, you have to form something to precondition with.
>
>> > As mentioned before, we are experimenting with iterating locally over
>> > cells sharing common dofs and combining batches of element tensors
>> > before inserting into the global sparse matrix row by row. Let's see
>> > how it goes.
>>
>> Yes, this is interesting. Would be very interesting to hear about the
>> progress.
>>
>> It is also interesting to understand what would optimize the insertion
>> for
>> different linear algebra backends, in particular Jed seems to have a
>> good
>> knowledge on petsc. We could then build backend optimimization into the
>> local dof-orderings etc.
>
> I just press M-. when I'm curious :-)
>
> I can't imagine it pays to optimize for a particular backend (it's not
> PETSc
> anyway, rather whichever format is used by the preconditioner).  The CSR
> data
> structure is pretty common, but it will always be fastest to insert an
> entire
> row at once.  If using an intermediate hashed structure makes this
> convenient,
> then it would help.  The paper I posted assembles the entire matrix in
> hashed
> format and then converts it to CSR.  I'll guess that a hashed cache for
> the
> assembly (flushed every few MiB, for instance) would work at least as well
> as
> assembling the entire thing in hashed format.

Yes, it seems that some form of hashed structure is a good possibility to
optimize. What Murtazo is referring to would be similar to hash the whole
matrix as in the paper you posted, and at Simula work on the row hashed
structures appears to be under way.

It will be interesting to see the results.

/Johan

>
> Jed
> _______________________________________________
> DOLFIN-dev mailing list
> DOLFIN-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/dolfin-dev
>




Follow ups

References