← Back to team overview

dolfin team mailing list archive

Re: BLAS mode verified

 

On Fri, Oct 07, 2005 at 01:23:20PM +0200, Johan Jansson wrote:

> > 1. For the convection-diffusion form, most of the work is computing
> > the geometry tensor and that is performed in the same way in both
> > cases. Maybe we could generate some loop-based code for computing the
> > geometry tensor also.
> > 
> > 2. Assigning to array entries (G[i] = ... in BLAS mode) seems to take
> > longer to compile than assigning to variables (double Gi = ... in
> > default mode).
> > 
> > 3. It actually matters that the default mode of FFC removes any
> > products with zeros in the tensor product. BLAS does not know about
> > zeros.
> 
> This seems to be a severe limitation. For PDE systems with Lagrange
> elements (the typical case I guess), there will be lots of zeros. From
> previous discussions about FFC/Ferari, the conclusion was that
> skipping the zero elements was the dominant optimization for computing
> the element matrix.

Some initial benchmarks I've made indicate that level 2 BLAS
optimization does actually improve the run-time speed of assembly for
high-order elements. This improvement will be even better when we move
to level 3 BLAS, which is just a simple extension of what we already
have now in DOLFIN/FFC. However, the main advantage of the BLAS mode
is probably reduced compile-time (with gcc) for high-order elements
(reduced by a factor 100 for degree 8 Poisson).

I'll post some more details later from the benchmarks.

/Anders



References