ffc team mailing list archive
-
ffc team
-
Mailing list archive
-
Message #00381
Re: Benchmark results for new BLAS mode
Ok, these results look reasonable. BLAS does blocking (reordering
computations to utilize the cache better), and FFC doesn't do that
BLAS really only gets you blocking advantages for level 3. I suspend
all judgments on these benchmarks until level 3 is in place. The
only big difference either way seems to be compile time.
(yet at least), so it's reasonable that BLAS will be faster the larger
the tensors get (at least that's what I think is happening). BLAS is
This will be very pronounced for level 3 blas.
compiled by a Fortran compiler though, so it could gain a factor 2-3
there.
Atlas BLAS may be C code. Don't worry about the language issues.
It's a black box.
I still think code generation is the way to go though. The BLAS mode
probably won't ever be able to exploit the sparsity of the reference
tensor fully (perhaps on a coarse level), and won't be able to use
other Ferari-style optimizations. And if it's really necessary, FFC
FErari only gets you so much. The factor of 3 on floating-point
operations I gain on the weighted Laplacian will be more than
compensated for by level 3 BLAS (factor of 10 or so compared to level
2).
The best way to go is
i.) Figure out block structure first.
ii.) See whether FErari or level 3 wins after you do the coarse-level
block structure. This will depend on the form, the polynomial
degree, how well ferari does, etc.
could generate Fortran code. It's probably also a significant benefit
to generate code for the mappings as well. In the long term, I don't
see how you're ever going to be able to beat code generation for
runtime speed.
And the build system gets even more complicated :)
Actually, good C code shouldn't lose by a factor of 2-3. More like
5-10% atworst.
Rob Kirby
"Mathematical software should be mathematical."
Follow ups
References