← Back to team overview

ffc team mailing list archive

Re: [Branch ~ffc-core/ffc/main] Rev 1684: Change code generation for evaluate_basis and

 

On 09/12/11 21:56, Kristian Ølgaard wrote:
On 12 September 2011 21:36, Marie E. Rognes<meg@xxxxxxxxx>  wrote:
On 09/12/11 20:00, Marie E. Rognes wrote:
On 09/12/11 19:54, Garth N. Wells wrote:
On 12 September 2011 18:49, Marie E. Rognes<meg@xxxxxxxxx>    wrote:
On 09/12/11 19:40, Garth N. Wells wrote:
Which compiler options did you use when evaluating the speed up?

Tested Extrapolation.h with vanilla dolfin (which is dominated by
evaluate_basis calls). No additional compiler options set.

What are the default compiler options?

'-g' for plain JIT, which is dead slow.  You should test with at least:

     parameters["form_compiler"]["cpp_optimize"] = True

in the Python code. This will use '-O2'.
Isn't this limited in a way? Would it be a problem to let users do:

parameters["form_compiler"]["cpp_optimize"] = '-O2 -funroll-loops'
parameters["form_compiler"]["cpp_optimize"] = '-O3'

and then perhaps let

parameters["form_compiler"]["cpp_optimize"] = True

default to '-O2' as we do now?
Just a thought.

Ok, thanks -- I'll take a closer look.

Take a look at the attached results in old_evaluate_basis.txt (results with
"old" FFC),
and new_evaluate_basis.txt (results with "new" FFC) from running the
attached
test_evaluate_basis.py.

Acceptable?
Looks good, and the generated code is much nicer now. :)
It could have been fun to see the impact of the '-O2 -funroll-loops'
option on the old code, but then you'll have to switch to C++. Anyway,
I'm quite sure that the old code will never perform as well as the new
code even with this option.

As you have probably found out, the generated code was simply a mirror
of what is going on in FIAT (translated to C++).

Yep.

Perhaps there are more places where we can simplify the generated code?


Probably, did you have anything particular in mind?

One thing we could do to reduce code size
would be to move the evaluation of the modal(?) basis functions
outside of the switch and just do the vector-vector product inside.

Also, I think it would significantly speed up evaluate_basis_all,
if we just did the evaluation of the modal basis functions once,
and then the vector-vector product 'local_dimension'-times.

Actually, I plan on doing that unless anyone protests vehemently.
The reduction in generated code from the one should more or less
counteract the increase in generated code from the other.

Another thing in relation to improving the evaluate_basis* functions
that I have thought about is if it's really necessary to support
derivatives of arbitrary order. If we only generate code for the first
derivative by default (and support arbitrary derivatives by a command
line argument) the code will be a lot simpler (easier on C++ compiler)
and much faster irrespective of which gcc optimisation is being used.


Sound neat to me.

--
Marie

Kristian

--
Marie





_______________________________________________
Mailing list: https://launchpad.net/~ffc
Post to     : ffc@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~ffc
More help   : https://help.launchpad.net/ListHelp





Follow ups

References