ffc team mailing list archive
-
ffc team
-
Mailing list archive
-
Message #00079
Re: Preliminary benchmark results for FFC
I agree, and even if my simple implementation (three nested loops) of
the quadrature approach is not optimal, it provides a fixed ruler to
use for measurements. Both now for plain FFC and later for FFC+FErari.
/Anders
On Tue, Mar 22, 2005 at 03:54:56PM -0600, Robert Kirby wrote:
> Anders, In my estimation, you can only do so much to the quadrature
> approach. There are basically two things without doing really bizarre
> compiler tricks:
> - get a better quadrature rule
> - get a better loop nest.
>
> The savings of the first one can be significant if you are doing Gauss
> quadrature mapped to a tet in 3D versus using the "optimal" points.
>
> For the second, the savings is about a factor of two at best.
>
> This holds if you do one element at a time. You can do a space/time
> tradeoff and do lots of elements in batch by vectorizing and/or using
> level 3 blas. This can get your speed up, and it applies to both FFC-type
> contractions and quadrature. Incidentally, this is how Kevin can afford
> to do interpretation on his tree at run-time.
>
> Part of our system should do some kind of heuristics to
> test what the write balance between space and time is and generate code
> that will do elements in batch. This is an optimization, but an important
> one to play with.
>
> Rob
>
> Robert Kirby
> Assistant Professor
> Department of Computer Science
> The University of Chicago
> http://people.cs.uchicago.edu/~kirby
>
> On Tue, 22 Mar 2005, Anders Logg wrote:
>
> > I'm working on some benchmarks comparing FFC with the standard
> > quadrature approach and the results look pretty good. The typical
> > speedup is a factor 10-100.
> >
> > I've run tests for Lagrange elements with q = 1,2,3 for a simple mass
> > matrix, Poisson, the nonlinear term of Navier-Stokes and the
> > strain-strain term of linear elasticity. Higher order is on it's way,
> > but it takes a long time for FIAT to evaluate the basis functions... :-).
> >
> > See the attached file for some preliminary results. The times
> > reported are for computing the element matrix (local stiffness matrix)
> > 10,000 times.
> >
> > Note that this is without any FErari optimizations. (On the other
> > hand, the quadrature-based code can probably also be optimized.)
> >
> > /Anders
> >
>
References