dolfin team mailing list archive

Thread
Date

Re: profiling an assembly

To: dolfin-dev@xxxxxxxxxx
From: "Martin Sandve Alnæs" <martinal@xxxxxxxxx>
Date: Mon, 19 May 2008 09:31:16 +0200
Delivered-to: dolfin-dev@xxxxxxxxxx
In-reply-to: <20080519071920.GR21713@brakk.ethz.ch>
Sender: martin.alnes@xxxxxxxxx

2008/5/19 Jed Brown <jed@xxxxxxxx>:
> On Mon 2008-05-19 08:41, Johan Hoffman wrote:
>> > 2008/5/19 Johan Hoffman <jhoffman@xxxxxxxxxx>:
>> >>> On Sun 2008-05-18 22:55, Johan Hoffman wrote:
>> >>>> > On Sun 2008-05-18 21:54, Johan Hoffman wrote:
>> >>>> >> > On Sat, May 17, 2008 at 04:40:48PM +0200, Johan Hoffman wrote:
>> >>>> >> >
>> >>>> >> > 1. Solve time may dominate assemble anyway so that's where we
>> >>>> should
>> >>>> >> > optimize.
>> >>>> >>
>> >>>> >> Yes, there may be such cases, in particular for simple forms
>> >>>> (Laplace
>> >>>> >> equation etc.). For more complex forms with more terms and
>> >>>> coefficients,
>> >>>> >> assembly typically dominates, from what I have seen. This is the
>> >>>> case
>> >>>> >> for
>> >>>> >> the flow problems of Murtazo for example.
>> >>>> >
>> >>>> > This probably depends if you use are using a projection method.  If
>> >>>> you
>> >>>> > are
>> >>>> > solving the saddle point problem, you can forget about assembly
>> >>>> time.
>> >>>>
>> >>>> Well, this is not what we see. I agree that this is what you would
>> >>>> like,
>> >>>> but this is not the case now. That is why we are now focusing on the
>> >>>> assembly bottleneck.
>> >>>
>> >>> This just occurred to me.  If you have a Newtonian fluid, then the
>> >>> momentum
>> >>> equations are block diagonal, but this is not reflected in the matrix
>> >>> structure.
>> >>> Sure enough, run the stokes demo with -mat_view_draw -draw_pause -1 and
>> >>> note
>> >>> that the off-diagonal blocks of the momentum equations are cyan which
>> >>> means they
>> >>> are set, but have value zero.  This almost doubles the number of
>> >>> insertions into
>> >>> the global matrix.
>> >>
>> >> Good. You are right, this piece of information is not used.
>> >>
>> >> I guess the most general thing is to have ffc delete zero matrix entries
>> >> in computing the sparsity pattern. I do not think this is done today?
>> >
>> > We could add it with an appropriate
>> > ufc::dof_map::tabulate_sparsity_pattern(...),
>> > since the form compiler can figure out which entries are always zero.
>> > Currently, this information is hidden from dolfin, and therefore it must
>> > simply use the local blocks.
>>
>> Yes, this would be very useful to add.
>>
>> > But A.add(...) calls a single block-addition function in PETSc
>> > or Trilinos, does anyone know how these will perform if the values
>> > contain zeros that are outside the initialized sparsity pattern?
>>
>> Worst case scenario is that a reallocation is triggered, but maybe it is
>> dealt with in a less drastic way?
>
> It is important to preallocate for all the entries you will be inserting.  If
> the option MAT_IGNORE_ZERO_ENTRIES is set then zero entries will not be
> inserted.  I don't think you want to rely on this as a general mechanism for
> eliminating non-existant coupling since it would also eliminate entries that
> just happened to be missing for the current Jacobian.
>
> Note that MatSetValues() needs to be called with a ``full'' block so if we
> eliminate the coupling zeros, we must make separate calls for the rows of each
> vector component.  However, this should not make much performance difference

Ok, this was what I wanted to know. And if the ongoing work with assembling
semi-local matrices over patches of the mesh and inserting one row at a time
goes well, then we'll use single row insertion anyway. Epetra::FECrsMatrix
(Trilinos) has a function to add a single row which is documented to ignore
values that aren't part of the sparsity pattern (Epetra::FECrsGraph).

Regarding adding sparsity pattern information to an ufc::form, it seems to me
it should be part of the *integral classes, one
tabulate_sparsity_pattern to match
each tabulate_tensor, plus a num_nonzeros() function.

> since the only overhead from splitting the rows is the function call.  Of
> course, if we use a hashed cache then this is definitely a non-issue.
>
>> >>  Of course, if you really care about speed, you could
>> >>> implement the momentum equations matrix-free with a custom
>> >>> preconditioner
>> >>> that
>> >>> only uses one block (since they are the same).
>> >>>
>> >>> Also, it seems like a strange design decision for general coupled
>> >>> systems
>> >>> for
>> >>> the blocks to be separate like this.  (Actually, ordering the unknowns
>> >>> this way
>> >>> makes some factorizations ``work'' even when the matrix is indefinite.)
>> >>
>> >>  Ok. We should look in to this.
>> >
>> > The reasoning behind this decision was that it makes access to subvectors
>> > for components of general hierarchial mixed elements easy. Vector elements
>> > and mixed elements are treated as basically the same thing.
>> >
>> > Thus, renumbering should preferably happen in dolfin, which will have to
>> > be
>> > done for many occasions anyway.
>>
>> Yes, it seems natural to have the option to renumber freely in dolfin, in
>> particular to switch between "vertex-oriented" and "vector-component
>> oriented" numbering.
>
> I think this would be good.  Note that the non-square coupling matrices between
> mixed elements never need to be formed (for most saddle point preconditioners)
> so it would be natural to write them in separate form files.
>
>
>> Iterating once is similar to a projection method, but we iterate until
>> convergence in the outer non linear residual. Of course, after some time
>> the number of iterations is typically 1, so the method collapses into a
>> projection type method.
>
> Okay, cool.  Thanks for clarifying.  How fast is the convergence the first time
> for the nonlinear problem?  My (limited) experience and impression from the
> literature is that you only see quadratic convergence from the Newton method if
> you solve the linear saddle point problem instead of replacing it with the
> projection preconditioner.  Of course, if it only takes one iteration after some
> time, then $\Delta t \eta$ must be small in which case we expect the projection
> method to be good (this is not the case for the slow non-Newtonian fluids I work
> with).
>
> Jed
>
> _______________________________________________
> DOLFIN-dev mailing list
> DOLFIN-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/dolfin-dev
>
>



-- 
Martin

Follow ups

Re: profiling an assembly
From: Anders Logg, 2008-05-22
Re: profiling an assembly
From: Jed Brown, 2008-05-19

References

Re: profiling an assembly
From: Dag Lindbo, 2008-05-17
Re: profiling an assembly
From: Anders Logg, 2008-05-18
Re: profiling an assembly
From: Johan Hoffman, 2008-05-18
Re: profiling an assembly
From: Jed Brown, 2008-05-18
Re: profiling an assembly
From: Johan Hoffman, 2008-05-18
Re: profiling an assembly
From: Jed Brown, 2008-05-18
Re: profiling an assembly
From: Johan Hoffman, 2008-05-19
Re: profiling an assembly
From: Martin Sandve Alnæs, 2008-05-19
Re: profiling an assembly
From: Johan Hoffman, 2008-05-19
Re: profiling an assembly
From: Jed Brown, 2008-05-19