dolfin team mailing list archive
-
dolfin team
-
Mailing list archive
-
Message #01126
Re: BLAS mode verified
On Fri, Oct 07, 2005 at 01:23:20PM +0200, Johan Jansson wrote:
> > 1. For the convection-diffusion form, most of the work is computing
> > the geometry tensor and that is performed in the same way in both
> > cases. Maybe we could generate some loop-based code for computing the
> > geometry tensor also.
> >
> > 2. Assigning to array entries (G[i] = ... in BLAS mode) seems to take
> > longer to compile than assigning to variables (double Gi = ... in
> > default mode).
> >
> > 3. It actually matters that the default mode of FFC removes any
> > products with zeros in the tensor product. BLAS does not know about
> > zeros.
>
> This seems to be a severe limitation. For PDE systems with Lagrange
> elements (the typical case I guess), there will be lots of zeros. From
> previous discussions about FFC/Ferari, the conclusion was that
> skipping the zero elements was the dominant optimization for computing
> the element matrix.
Some initial benchmarks I've made indicate that level 2 BLAS
optimization does actually improve the run-time speed of assembly for
high-order elements. This improvement will be even better when we move
to level 3 BLAS, which is just a simple extension of what we already
have now in DOLFIN/FFC. However, the main advantage of the BLAS mode
is probably reduced compile-time (with gcc) for high-order elements
(reduced by a factor 100 for degree 8 Poisson).
I'll post some more details later from the benchmarks.
/Anders
References