← Back to team overview

ffc team mailing list archive

Re: Benchmark results for new BLAS mode

 

On Mon, Oct 10, 2005 at 10:52:29AM -0500, Anders Logg wrote:

...

>   Concerning run-time performance, FFC default mode is faster than
>   BLAS mode for small q, and BLAS is faster for large q. For Poisson
>   in 3D, the break-even point is at q = 6.
> 
>   Conclusion: the FFC blas mode will in some cases generate faster
>   code, but it's first and foremost an option that should be
>   considered to decrease the compile-time (with g++).
> 
> - Overall conclusions:
> 
>   It's unecessary to compile with -O1 or -O2. We have -O2 by
>   default in DOLFIN, but should probably consider to change to
>   -O0, at least for the compilation of forms. There is probably
>   a huge improvement in compile-time for Navier-Stokes if we switch
>   to -O0.
> 
>   BLAS can be an option to reduce compile-time and possibly to
>   improve the run-time for high-order forms.
> 
> - Finally, note that these benchmarks are for Poisson, where the main
>   part of the work is the computation of the element tensor (doing the
>   tensor product). For other forms, computing the geometry tensor can
>   dominate, and then BLAS mode won't help, but compiling with -O0 may.
> 
> /Anders

Ok, these results look reasonable. BLAS does blocking (reordering
computations to utilize the cache better), and FFC doesn't do that
(yet at least), so it's reasonable that BLAS will be faster the larger
the tensors get (at least that's what I think is happening). BLAS is
compiled by a Fortran compiler though, so it could gain a factor 2-3
there.

I still think code generation is the way to go though. The BLAS mode
probably won't ever be able to exploit the sparsity of the reference
tensor fully (perhaps on a coarse level), and won't be able to use
other Ferari-style optimizations. And if it's really necessary, FFC
could generate Fortran code. It's probably also a significant benefit
to generate code for the mappings as well. In the long term, I don't
see how you're ever going to be able to beat code generation for
runtime speed.

Still, why not have both modes. Evidently the code generation mode
suffers from excessive compile time, so clearly the BLAS mode has an
advantage there.

  Johan



Follow ups

References