dolfin team mailing list archive

Thread
Date

Re: Generic meta-programming, faster than form compiler?

To: Robert Kirby <robert.c.kirby@xxxxxxxxx>
From: A Navaei <axnavaei@xxxxxxxxxxxxxx>
Date: Mon, 23 Mar 2009 12:58:06 +0000
Cc: dolfin-dev <dolfin-dev@xxxxxxxxxx>
Delivered-to: dolfin-dev@xxxxxxxxxx
In-reply-to: <b376f5650903230536g2b250897y8bb1d0e17b18b39d@mail.gmail.com>

2009/3/23 Robert Kirby <robert.c.kirby@xxxxxxxxx>:
> Hi all, some thoughts:
> 1.) In the current paradigm (build + apply), building the matrix is
> typically not dominant in the overall run-time.

Isn't this in contrast with what Kent said about matrix insertion
being the bottleneck? Are there any references for these claims?

Optimizing the application
> in a Krylov solver is good.  Optimizing matrix-free evaluation is also good.
>  Optimizing construction is nice if you have to do it frequently, so it
> doesn't hurt.  Just don't explode compile time.
> 2.) template metaprogramming can be very powerful (it's Turing complete, if
> only accidentally), but it can also obscure the code and make it difficult
> to maintain and modify.  While ffc may be complicated inside and the code it
> generates ugly, the inputs that a user writes are quite nice.  A look inside
> Thyra within Trilinos will indicate that too much metaprogramming at the
> user interface level can become cumbersome.

Like I mentioned in reply to Garth, the form DSL and FFC code
generation are independent concepts: DSL parsing does not have to be
necessarily followed by code-generation. The DSL can continue its job
of hiding complexities while performing FEM using template
meta-programming.


-Ali

> 3.) LifeV does some template metaprogramming in the FEM context with some
> success.
> 4.) Before trying to optimize this or that or import new technologies, it
> makes sense to do some serious profiling on existing codes.  Not just
> Poisson.  Something nontrivial, both in terms of size and complexity, like a
> turbulence model or inverse problem with hard-to-solve systems, lots of
> assembly, data I/O.  If you spend 90% of your time in PETSc kernels, don't
> bother with further optimizations of ffc/dolfin.  I don't know what the
> numbers are.  But it might make a very interesting talk for someone to give
> at FEniCS'09.
> Rob
> On Mon, Mar 23, 2009 at 7:12 AM, Garth N. Wells <gnw20@xxxxxxxxx> wrote:
>>
>>
>> A Navaei wrote:
>> > 2009/3/23 Kent Andre <kent-and@xxxxxxxxx>:
>> >> The code that FFC produces is about as fast as light. It has been
>> >> documented in a number of papers.
>> >
>> > Is there any data available comparing the FFC performance to the
>> > hardware peak?
>> >
>>
>> FFC does not operate in isolation, so it is not possible to make a
>> comparison to max flops of a CPU. Furthermore, in a typical simulation
>> with code generated by FFC, other parts of the solution process dominate
>> (such as insertion as mentioned by Kent) and the linear solve, so
>> whether or not FFC generated code is optimal in terms of peak flops of a
>> machine is not relevant to runtime performance.
>>
>> >> I don't think you should try to beat FFC with generic meta-programming.
>> >> Or you could do it but, but don't have to high expectations...
>> >>
>> >> Insertion into the matrix is currently the bottleneck. But FFC does
>> >> not have anything to do with this.
>> >
>> > While FFC doesn't have anything to do with this, dolfin does. In the
>> > case of the MTL4 backend wrapper, it is implemented badly by ignoring
>> > the meta-programming potentials.
>>
>> This is not a constructive comment. Patches are welcome.
>>
>> For instance, sparse matrix insertion
>> > is done by forming a sparsity pattern outside of MTL4 and then
>> > assigning the pointers to MTL4 API, while loop unrolling could have
>> > been used here.
>> >
>>
>> If you look at the code, the FFC backend does not use the sparsity
>> pattern. The MTL4 inserter does have some options which we have not yet
>> been taken advantage of, so again patches are welcome.
>>
>> Garth
>>
>> >
>> > -Ali
>> >
>> >> Kent
>> >>
>> >>
>> >> On ma., 2009-03-23 at 10:11 +0000, A Navaei wrote:
>> >>> The success of MTL4 based on generic meta-programming, arises the
>> >>> question about re-visiting the efficiency of code-generation
>> >>> approaches, including FFC. Given that FEM can particularly benefit
>> >>> from major meta-programming characteristics, namely static
>> >>> polymorphism and loop unrolling, MTL4 demonstrates that the
>> >>> code-generation part can be much more efficiently replaced by inlining
>> >>> performed at compile-time.
>> >>>
>> >>> Without having a concrete meta-programming implementation, it may be
>> >>> impossible to predict how much performance one would gain compared to
>> >>> FFC. However, MTL4 has been reported to be many times faster than
>> >>> code-generation means such as ATLAS.
>> >>>
>> >>> Based on this, are there any specific benefits in FFC code-generation
>> >>> which may not be covered by meta-programming?
>> >>>
>> >>>
>> >>> -Ali
>> >>> _______________________________________________
>> >>> DOLFIN-dev mailing list
>> >>> DOLFIN-dev@xxxxxxxxxx
>> >>> http://www.fenics.org/mailman/listinfo/dolfin-dev
>> >>
>> > _______________________________________________
>> > DOLFIN-dev mailing list
>> > DOLFIN-dev@xxxxxxxxxx
>> > http://www.fenics.org/mailman/listinfo/dolfin-dev
>>
>> _______________________________________________
>> DOLFIN-dev mailing list
>> DOLFIN-dev@xxxxxxxxxx
>> http://www.fenics.org/mailman/listinfo/dolfin-dev
>
>

Follow ups

Re: Generic meta-programming, faster than form compiler?
From: Martin Sandve Alnæs, 2009-03-23

References

Generic meta-programming, faster than form compiler?
From: A Navaei, 2009-03-23
Re: Generic meta-programming, faster than form compiler?
From: Kent Andre, 2009-03-23
Re: Generic meta-programming, faster than form compiler?
From: A Navaei, 2009-03-23
Re: Generic meta-programming, faster than form compiler?
From: Garth N. Wells, 2009-03-23
Re: Generic meta-programming, faster than form compiler?
From: Robert Kirby, 2009-03-23