← Back to team overview

ffc team mailing list archive

Re: Benchmark results for new BLAS mode

 


Ok, these results look reasonable. BLAS does blocking (reordering
computations to utilize the cache better), and FFC doesn't do that

BLAS really only gets you blocking advantages for level 3. I suspend all judgments on these benchmarks until level 3 is in place. The only big difference either way seems to be compile time.

(yet at least), so it's reasonable that BLAS will be faster the larger
the tensors get (at least that's what I think is happening). BLAS is

This will be very pronounced for level 3 blas.


compiled by a Fortran compiler though, so it could gain a factor 2-3
there.


Atlas BLAS may be C code. Don't worry about the language issues. It's a black box.


I still think code generation is the way to go though. The BLAS mode
probably won't ever be able to exploit the sparsity of the reference
tensor fully (perhaps on a coarse level), and won't be able to use
other Ferari-style optimizations. And if it's really necessary, FFC

FErari only gets you so much. The factor of 3 on floating-point operations I gain on the weighted Laplacian will be more than compensated for by level 3 BLAS (factor of 10 or so compared to level 2).

The best way to go is
i.) Figure out block structure first.
ii.) See whether FErari or level 3 wins after you do the coarse-level block structure. This will depend on the form, the polynomial degree, how well ferari does, etc.


could generate Fortran code. It's probably also a significant benefit
to generate code for the mappings as well. In the long term, I don't
see how you're ever going to be able to beat code generation for
runtime speed.

And the build system gets even more complicated :)
Actually, good C code shouldn't lose by a factor of 2-3. More like 5-10% atworst.


Rob Kirby

"Mathematical software should be mathematical."






Follow ups

References