ffc team mailing list archive
-
ffc team
-
Mailing list archive
-
Message #01787
Re: quadrature optimisations
On Mon, Sep 08, 2008 at 03:40:55PM +0200, Kristian Oelgaard wrote:
>
> Hi,
>
> Here is a comparison between tensor representation and the previous
> quadrature representation and the new and optimised version of quadrature
> representation.
>
> The FFC compile time is measured as follows:
> - simplify, the time spent on simplifying the expression
> - repres., the time spent on computing the representation
> - code gen., the time spent on actual code generation
> - FFC, total time spent on compiling the form
>
> The 3 stages (simplify, repres. and code gen.) accounts for around 95% of
> the FFC compile time.
>
> - size, is the size of the header file.
>
> - DOLFIN, is the time spent on compiling a simple main.cpp file including
> the generated header file against DOLFIN.
>
> - run, is the runtime measured as the time it takes to call tabulate_tensor()
> N times. No assembly is performed. If a form contains facet integrals
> tabulate_tensor() is called for each of the cases. E.g., a DG form in
> 3D with one interior facet integral will call tabulate_tensor()
> N*4*4 times.
>
> - fac., is the runtime divided by the runtime for the previous version of
> quadrature representation
>
> All forms are bilinear forms.
>
> Elasticity 3D, 2nd order elements, N = 500,000
> Description: No functions, just basisfunctions and geometry terms
>
> simplify repres. code gen. FFC size DOLFIN run fac.
> tensor 0.21s 0.05s 0.93s 1.41s 598kb 8.17s 1.8s 0.047
> old quad 0.21s 0.05s 0.98s 1.45s 476kb 7.39s 38.5s 1.000
> new quad 0.21s 0.05s 0.98s 1.48s 464kb 7.34s 31.0s 0.805
>
> Note: For forms without any functions tensor reprentation is ALWAYS much
> faster. (about 17 times in this case)
>
>
> Plasticity 2D, 1st order elements, N = 100,000,000
> Description: 9 component tangent defined on VectorQuadratureElement
>
> simplify repres. code gen. FFC size DOLFIN run fac.
> tensor 0.14s 0.11s 0.20s 0.62s 232kb 6.36s 25.5s 0.560
> old quad 0.14s 0.06s 0.37s 0.77s 228kb 6.33s 45.5s 1.000
> new quad 0.14s 0.06s 0.24s 0.64s 230kb 6.27s 20.8s 0.457
>
> Note: Not much difference between tensor and the new quadrauture
> representation, both are about 2 times faster than the old version of
> quadrature representation.
>
>
> Plasticity 2D, 3rd order elements, N = 500,000
> Description: 9 component tangent defined on VectorQuadratureElement
>
> simplify repres. code gen. FFC size DOLFIN run fac.
> tensor 0.14s 0.55s 3.37s 4.22s 1.8MB 43.61s 54.5s 1.518
> old quad 0.14s 0.14s 1.69s 2.19s 414kb 7.58s 35.9s 1.000
> new quad 0.14s 0.15s 0.50s 1.00s 410kb 7.51s 10.5s 0.292
>
> Note: For higher order elements, the code generated by tensor representation
> grows in size increasing the DOLFIN compile time. The new quadrature
> is 3 and 5 times faster than the old quadrature and tensor respectively.
> The FFC compile time is also 2-4 times faster (not that it makes much of
> a difference since the total compile time is only 1 sec.)
>
>
> Plasticity 3D, 1st order elements, N = 10,000,000
> Description: 36 component tangent defined on VectorQuadratureElement
>
> simplify repres. code gen. FFC size DOLFIN run fac.
> tensor 2.04s 3.36s 5.77s 11.89s 775kb 12.76s 52.9s 1.441
> old quad 2.04s 0.86s 11.71s 15.35s 670kb 11.72s 36.7s 1.000
> new quad 2.01s 0.85s 1.78s 5.33s 693kb 11.89s 19.0s 0.518
>
> Note: The new quadrature compiles 2-3 times faster with FFC and is 2-3 times
> faster at runtime.
>
>
> Plasticity 3D, 2nd order elements, N = 100,000
> Description: 36 component tangent defined on VectorQuadratureElement
>
> simplify repres. code gen. FFC size DOLFIN run fac.
> tensor 2.03s 34.93s 236.6s 275.30s 11MB * --- ---
> old quad 2.05s 2.15s 68.3s 73.30s 1.4MB 16.89s 37.8s 1.000
> new quad 2.04s 2.15s 2.9s 7.82s 1.4MB 16.67s 6.7s 0.177
>
> * ran out of memory after 8min.
> cc1plus: out of memory allocating 1477058608 bytes after a total\
> of 134725632 bytes
> (also tried to split FFC output in *.h and *.cpp, same result)
>
> Note: Tensor representation takes forever to compile with FFC and the
> resulting code can't be compiled against DOLFIN. The new quadrature
> compiles 10 times faster with FFC and runs about 5 times faster.
>
>
> PressureEquation 2D, 2nd order elements, N = 100,000
> Description: Many, many functions
>
> simplify repres. code gen. FFC size DOLFIN run fac.
> tensor 23.7s 0.45s 2.20s 29.0s 2.6MB 36.05s 6.76s 0.0168
> old quad 23.5s 0.41s 16.48s 43.1s 556kb 9.02s 400.40s 1.000
> new quad 23.7s 0.41s 3.05s 29.9s 544kb 8.69s 1.03s 0.0025
>
> Note: The FFC compile time has been reduced for the new quadrature so that
> it's comparable to that of tensor representation, note that most time
> is spent by simplify. The runtime is now 6-7 times faster than tensor
> representation which is almost 400!! times faster than the old version
> of quadrature.
>
>
> BiharmonicDG_2D, 3rd order elements, N = 200,000
> Description: Interior facet integrals, higher order derivatives.
>
> simplify repres. code gen. FFC size DOLFIN run fac.
> tensor 1.11s 1.61s 13.67s 16.70s 3.2MB 46.26s 31.6s 0.280
> old quad 1.12s 1.26s 4.89s 7.64s 487kb 9.62s 112.9s 1.000
> new quad 1.12s 1.25s 2.95s 5.72s 427kb 7.80s 33.7s 0.298
>
> Note: Faster compile time for both FFC and DOLFIN compared to tensor, and an
> equivalent runtime performance.
> (factor 3 better than the old quadrature)
>
>
> BiharmonicDG_3D, 3rd order elements, N = 2,000
> Description: Interior facet integrals, higher order derivatives.
>
> simplify repres. code gen. FFC size DOLFIN run fac.
> tensor 2.70s * --- --- --- --- --- ---
> old quad 2.70s 7.86s 60.5s 72.0s 2.9MB 70.2s 51.5s 1.000
> new quad 2.65s 7.79s 28.7s 39.9s 2.4MB 36.8s 10.4s 0.202
>
> tensor 2.70s * --- --- --- --- --- ---
> old quad 2.70s 7.86s 60.5s 72.0s 2.9MB 70.2s 51.5s 1.000
> new quad 2.65s 7.79s 28.7s 39.9s 2.4MB 36.8s 10.4s 0.202
>
> * MemoryError during compute representation
>
> Note: A factor of 2 speed-up at the code generation stage, and less
> code as output. 2 times faster DOLFIN compile time and 5 times faster
> at runtime.
>
>
> DGSGPa, 3D linear elements, N = 20000
> Description: DG strain gradient plasticity form, among other crazy things
> a 81 component tangent on linear discontinuous elements.
>
> simplify repres. code gen. FFC size DOLFIN run fac.
> tensor 199s * --- --- --- --- --- ---
> old quad 200s 768s 1485s 2462s 11MB 220s 34.7s 1.000
> new quad 201s 763s 167s 1141s 9.0MB 114s 20.7s 0.628
>
> * MemoryError during compute representation
>
> Note: The FFC compile time has been reduced by a factor 2, also note that
> the code generation is now faster than simplifying the expression. It
> might be possible to optimise the representation stage by cutting some
> corners, but that is for later. The DOLFIN compile time is a factor 2
> faster, but unfortunately it did not have that big an impact on the
> runtime performance.
>
>
> CahnHilliard, Linear elements, N = 200000
> Description: Many functions.
> simplify repres. code gen. FFC size DOLFIN run fac.
> old quad a 6.88s 11.5s 640s --- --- --- ---
> old quad L 2.98s 237.1s 5571s 6470s 1.9MB 23.2s 72.1s 1.000
> new quad a 6.61s 10.7s 2.30s --- --- --- --- ---
> new quad L 3.14s 229.7s 1.63s 258s 1.9MB 20.8s 1.50s 0.021
>
> Note: I'll let the numbers on FFC compile time and runtime speak for
> themselves.
>
>
> Kristian
Very impressive!
--
Anders
Attachment:
signature.asc
Description: Digital signature
References