← Back to team overview

ffc team mailing list archive

Re: FFC performance

 

On Mon, 2006-05-08 at 15:21 +0200, Anders Logg wrote:
> On Mon, May 08, 2006 at 03:08:55PM +0200, Garth N. Wells wrote:
> > Hi Anders,
> > 
> > We're using FFC here to compile some pretty complicated mixed nonlinear
> > problems, and the FFC compile time (around 1 hour) is becoming an issue.
> > Are there any areas in the code that are ripe for some optimisations
> > (data structures, etc) that we could help with?  
> > 
> > Garth
> 
> Yes.
> 
> Check with -d0 how long it takes to compute the reference
> tensor(s). There might be several in your case. This should be fairly
> quick (I hope) since this has been optimized.
> 
> I suspect the main issue may be the actual code generation going on in
> elementtensor.py, that is, generating the list of Declarations that
> are later written to file in dolfin.py. I haven't tried to optimize
> this part so it should be ripe for optimization.
> 

This looks like the bottleneck. The reference tensors were typically
computed in less than one second, with the longest being 12s. Below are
lines of output from the profiler for both without and with the "-f
blas" option.  More that 1/2 the run time is spent in
__compute_element_tensor_default and dolfin.py when not using blas.

Garth

**** Without -f blas

         47310360 function calls (47307129 primitive calls) in 463.410
CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2  132.800   66.400  302.460  151.230                      

elementtensor.py:131(__compute_element_tensor_default)
  7131790  121.390    0.000  141.110    0.000 dolfin.py:35(<lambda>)
        8   40.440    5.055  136.120   17.015
elementtensor.py:175(__check_used)
  8083430   38.260    0.000   38.260    0.000 :0(join)
  7114400   30.700    0.000   30.700    0.000
referencetensor.py:154(__call__)
 10675073   28.710    0.000   28.710    0.000 debug.py:14(debug)
  7114406   25.290    0.000   25.290    0.000 :0(abs)
  3557202   15.440    0.000   15.440    0.000 dolfin.py:47(<lambda>)
      857    5.420    0.006    5.420    0.006 :0(outer)

**** With -f blas

Generating XML output
Output written to CahnHilliard2D-0.xml
Output written to CahnHilliard2D-1.xml
         24012820 function calls (24009589 primitive calls) in 282.910
CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2   85.140   42.570  119.110   59.555 xml.py:70(__form)
  3574590   60.310    0.000   70.320    0.000 dolfin.py:35(<lambda>)
        8   40.760    5.095  138.210   17.276
elementtensor.py:175(__check_used)
  7303043   27.320    0.000   27.320    0.000 :0(join)
  3557239   17.140    0.000   17.140    0.000 :0(write)
  3557200   14.920    0.000   14.920    0.000
referencetensor.py:154(__call__)
  3557206   11.390    0.000   11.390    0.000 :0(abs)
      857    5.330    0.006    5.330    0.006 :0(outer)
    50720    4.170    0.000    8.440    0.000
geometrytensor.py:78(__call__)
       13    3.530    0.272    8.940    0.688
monomialintegration.py:208(__compute_product)
      133    2.950    0.022    5.130    0.039
multiindex.py:12(build_indices)
763343/762858    1.990    0.000    2.000    0.000 :0(len)

> If you can find out what is taking so long, then we can think of how
> to do the optimizsation.
> 
>
> Also check with -f blas and see if that speeds up the code
> generation. It should.
> 
> /Anders
> 
> _______________________________________________
> FFC-dev mailing list
> FFC-dev@xxxxxxxxxx
> http://www.fenics.org/cgi-bin/mailman/listinfo/ffc-dev




References