dolfin team mailing list archive

Thread
Date

Re: SystemAssembler

To: Anders Logg <logg@xxxxxxxxx>
From: "Garth N. Wells" <gnw20@xxxxxxxxx>
Date: Tue, 4 Oct 2011 20:41:24 +0100
Cc: DOLFIN Mailing List <dolfin@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <20111004182337.GB1973@smaug>

On 4 October 2011 19:23, Anders Logg <logg@xxxxxxxxx> wrote:
> On Tue, Oct 04, 2011 at 06:53:33PM +0100, Garth N. Wells wrote:
>> On 4 October 2011 18:34, Anders Logg <logg@xxxxxxxxx> wrote:
>> > On Tue, Oct 04, 2011 at 06:21:33PM +0100, Garth N. Wells wrote:
>> >> On 4 October 2011 18:05, Kent-Andre Mardal <kent-and@xxxxxxxxx> wrote:
>> >> >
>> >> >
>> >> > On 4 October 2011 18:13, Anders Logg <logg@xxxxxxxxx> wrote:
>> >> >>
>> >> >> On Tue, Oct 04, 2011 at 04:17:01PM +0100, Garth N. Wells wrote:
>> >> >> > On 4 October 2011 12:24, Anders Logg <logg@xxxxxxxxx> wrote:
>> >> >> > > SystemAssembler does not support subdomains. It is even silently
>> >> >> > > ignoring all other integrals than number 0.
>> >> >> > >
>> >> >> > > This is one of the remaining bugs for 1.0-beta2. I can try to fix it
>> >> >> > > but would like some input on what shape the SystemAssembler is
>> >> >> > > currently in. I haven't touched it that much before since it looks
>> >> >> > > like a bit of code duplication to me. In particular, is it necessary
>> >> >> > > to keep both functions cell_wise_assembly and facet_wise_assembly?
>> >> >> > >
>> >> >> >
>> >> >> > It would require some performance testing to decide. I expect that,
>> >> >> > for performance reasons, both are required.
>> >> >>
>> >> >> I'm getting very strange results. Here are results for assembling
>> >> >> Poisson matrix + vector on a 32 x 32 x 32 unit cube:
>> >> >>
>> >> >>  Regular assembler: 0.658 s
>> >> >>  System assembler:   9.08 s (cell-wise)
>> >> >>  System assembler:    202 s (facet-wise)
>> >> >>

This is attributable to the lines

  const uint D = mesh.topology().dim();
  mesh.init(D - 1);
  mesh.init(D - 1, D);

I'm not sure that they're necessary. The slow down is profound because
we know that the mesh init functions are very slow. I'll check if the
init functions can just be removed.

Without the inits, the symmetric assembler appears to be very slightly faster.


>> >> >> Is this expected?
>> >> >>
>> >> >> What are the arguments against ditching SystemAssembler (for less code
>> >> >> duplication) and adding functionality for symmetric application of BCs
>> >> >> on the linear algebra level?
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > Earlier system_assemble of A and b was faster than assemble of A and b.
>> >>
>> >> It was faster because it was less general. After being generalised,
>> >> the performance of SystemAssembler was to all intents and purposes the
>> >> same as Assembler.
>> >>
>> >> > Something strange must have happened.
>> >> > SystemAssemble enforce symmetric BC elementwise which
>> >> > is much faster than doing it on linear algebra level.
>> >>
>> >> I'm not sure that that is true. In my experience the difference is marginal.
>> >
>> > It is worth testing. At any rate, it would be good if we could do it
>> > also on the linear algebra level (cf for example feature request from
>> > Doug Arnold earlier today).
>> >
>> > SystemAssembler is currently lacking support for subdomains and OpenMP
>> > assembly (which I think can be merged into the regular
>> > assembler). It's easier to maintain one single assembler.
>> >
>>
>> It's more subtle than this. If we have one assembler, it will be
>> slower. The OpenMP assembler is slower than the regular assembler when
>> using one thread because it iterates differently over cells.
>
> How is it different? I can't spot the difference on a quick
> glance. Shouldn't it just specialize to one color and then inside that
> the regular cell iteration?
>

The OpenMP assembler does not use the Cell iterator. It iterates over
a vector of cell indices. It's more challenging to get contiguous
memory access.

>> > To compare the two approaches, we need to (1) implement symmetric
>> > application of bcs and (2) get SystemAssembler to run at normal speed
>> > again. Marie's result is with a version from yesterday so it's not my
>> > bug fixes from today that cause the slowdown.
>> >
>>
>> If it's a question of choosing one over the other and removing one
>> there can only be one winner - symmetric assembly. It would be a poor
>> library that doesn't permit symmetric FE matrices, thereby eliminating
>> the use of Cholesky, CG, MINRES, etc, as linear solvers.
>
> The choice is between symmetric application of BCs either during
> assembly or after. In both cases we would end up with a symmetric
> matrix (for a symmetric form).
>

I think that it should be during assembly because manipulating sparse
matrices, especially in parallel, is very delicate and and efficient
approach is likely to be highly dependent on the sparse storage
scheme. Manipulating small dense matrices inside the assembler is easy
and cheap, is inherently local and is independent of the sparse
storage scheme.

Garth

> --
> Anders
>

References

SystemAssembler
From: Anders Logg, 2011-10-04
Re: SystemAssembler
From: Garth N. Wells, 2011-10-04
Re: SystemAssembler
From: Anders Logg, 2011-10-04
Re: SystemAssembler
From: Kent-Andre Mardal, 2011-10-04
Re: SystemAssembler
From: Garth N. Wells, 2011-10-04
Re: SystemAssembler
From: Anders Logg, 2011-10-04
Re: SystemAssembler
From: Garth N. Wells, 2011-10-04
Re: SystemAssembler
From: Anders Logg, 2011-10-04