ffc team mailing list archive
-
ffc team
-
Mailing list archive
-
Message #00594
Re: FFC performance
On Mon, 2006-05-08 at 15:21 +0200, Anders Logg wrote:
> On Mon, May 08, 2006 at 03:08:55PM +0200, Garth N. Wells wrote:
> > Hi Anders,
> >
> > We're using FFC here to compile some pretty complicated mixed nonlinear
> > problems, and the FFC compile time (around 1 hour) is becoming an issue.
> > Are there any areas in the code that are ripe for some optimisations
> > (data structures, etc) that we could help with?
> >
> > Garth
>
> Yes.
>
> Check with -d0 how long it takes to compute the reference
> tensor(s). There might be several in your case. This should be fairly
> quick (I hope) since this has been optimized.
>
> I suspect the main issue may be the actual code generation going on in
> elementtensor.py, that is, generating the list of Declarations that
> are later written to file in dolfin.py. I haven't tried to optimize
> this part so it should be ripe for optimization.
>
This looks like the bottleneck. The reference tensors were typically
computed in less than one second, with the longest being 12s. Below are
lines of output from the profiler for both without and with the "-f
blas" option. More that 1/2 the run time is spent in
__compute_element_tensor_default and dolfin.py when not using blas.
Garth
**** Without -f blas
47310360 function calls (47307129 primitive calls) in 463.410
CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
2 132.800 66.400 302.460 151.230
elementtensor.py:131(__compute_element_tensor_default)
7131790 121.390 0.000 141.110 0.000 dolfin.py:35(<lambda>)
8 40.440 5.055 136.120 17.015
elementtensor.py:175(__check_used)
8083430 38.260 0.000 38.260 0.000 :0(join)
7114400 30.700 0.000 30.700 0.000
referencetensor.py:154(__call__)
10675073 28.710 0.000 28.710 0.000 debug.py:14(debug)
7114406 25.290 0.000 25.290 0.000 :0(abs)
3557202 15.440 0.000 15.440 0.000 dolfin.py:47(<lambda>)
857 5.420 0.006 5.420 0.006 :0(outer)
**** With -f blas
Generating XML output
Output written to CahnHilliard2D-0.xml
Output written to CahnHilliard2D-1.xml
24012820 function calls (24009589 primitive calls) in 282.910
CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
2 85.140 42.570 119.110 59.555 xml.py:70(__form)
3574590 60.310 0.000 70.320 0.000 dolfin.py:35(<lambda>)
8 40.760 5.095 138.210 17.276
elementtensor.py:175(__check_used)
7303043 27.320 0.000 27.320 0.000 :0(join)
3557239 17.140 0.000 17.140 0.000 :0(write)
3557200 14.920 0.000 14.920 0.000
referencetensor.py:154(__call__)
3557206 11.390 0.000 11.390 0.000 :0(abs)
857 5.330 0.006 5.330 0.006 :0(outer)
50720 4.170 0.000 8.440 0.000
geometrytensor.py:78(__call__)
13 3.530 0.272 8.940 0.688
monomialintegration.py:208(__compute_product)
133 2.950 0.022 5.130 0.039
multiindex.py:12(build_indices)
763343/762858 1.990 0.000 2.000 0.000 :0(len)
> If you can find out what is taking so long, then we can think of how
> to do the optimizsation.
>
>
> Also check with -f blas and see if that speeds up the code
> generation. It should.
>
> /Anders
>
> _______________________________________________
> FFC-dev mailing list
> FFC-dev@xxxxxxxxxx
> http://www.fenics.org/cgi-bin/mailman/listinfo/ffc-dev
References