ffc team mailing list archive

Thread
Date
Re: quadrature optimisations

To: ffc-dev@xxxxxxxxxx
From: Anders Logg <logg@xxxxxxxxx>
Date: Mon, 28 Apr 2008 23:33:05 +0200
Delivered-to: ffc-dev@xxxxxxxxxx
In-reply-to: <148355.83138.qm@web32006.mail.mud.yahoo.com>
Mail-followup-to: ffc-dev@xxxxxxxxxx
User-agent: Mutt/1.5.17+20080114 (2008-01-14)
On Mon, Apr 28, 2008 at 02:16:24PM -0700, Reza Farrahi Moghaddam wrote:
> Hi Kristian,
>  
> Can you send me a complete example which uses quadrature representation
> (including main.cpp file)? There is a sample form-file with ffc code for
> nonlinear Poisson's equation (without Dolfin code).
> Also, is it possible to use quadrature representation for test and trial
> functions too? 
>  
> Reza

You just need to add -r quadrature when compiling.

-- 
Anders


> 
>  
> ----- Original Message ----
> From: Kristian Oelgaard <k.b.oelgaard@xxxxxxxxxx>
> To: "ffc-dev@xxxxxxxxxx" <ffc-dev@xxxxxxxxxx>
> Sent: Monday, April 28, 2008 2:25:53 PM
> Subject: [FFC-dev] quadrature optimisations
> 
> 
> 
> Hi,
> 
> I've pushed some optimisations for quadrature representation to the repository,
> there might be a mail notification on its way. Below are some benchmarks where
> I
> compare the performance of tensor representation to that of quadrature
> representation. Comparison is made between FFC compile times, size of generated
> code, DOLFIN compile time, and runtime.
> 
> I'm happy with a release anytime now although there's a lot of cleaning up to
> do
> in the quadrature code generation files but I might get around to that before
> the next release given that this will be in May/June sometime.
> 
> Kristian
> 
> 
> DOLFIN change-set 4105
> FFC    change-set 1119
> 
> The FFC and DOLFIN compile times were measured with the command
> $ time
> 
> The run-time was measured as the time spent on assembly of the global stiffness
> matrix once, i.e. time spent in the function assembleCells() in Assembler.cpp
> 
> 
> # Form: Elasticity
> element = VectorElement("Lagrange", "triangle", 1)
> 
> v = TestFunction(element)
> u = TrialFunction(element)
> 
> E  = 10.0
> nu = 0.3
> 
> mu    = E / (2*(1 + nu))
> lmbda = E*nu / ((1 + nu)*(1 - 2*nu))
> 
> def epsilon(v):
>     return 0.5*(grad(v) + transp(grad(v)))
> 
> def sigma(v):
>     return 2*mu*epsilon(v) + lmbda*mult(trace(epsilon(v)), Identity(len(v)))
> 
> a = dot(grad(v), sigma(u))*dx
> 
> # 2D elasticity, cases
> 1) 1st order elements, mesh(1000, 1000)
> 2) 2nd order elements, mesh(500, 500)
> 3) 4th order elements, mesh(125, 125)
> 4) 6th order elements, mesh(64, 64)
> 
> # Results
>               Tensor                        Quadrature
>   FFC    size  DOLFIN  run_t |  FFC  size    DOLFIN    run_q  run_q/run_t
> 1) 0.39s  136kb  2.34s  5.35s |  0.44s  134kb    2.35s    5.66s    1.05
> 2) 0.49s  191kb  2.66s  6.35s |  0.52s  176kb    2.66s    8.31s    1.23
> 3) 0.99s  600kb  6.66s  2.89s |  0.95s  406kb    4.10s    5.81s    2.01
> 4) 3.81s  1.8Mb  32.10s  2.93s |  3.52s  1.1Mb    9.36s    9.27s    3.16
> 
> 
> # 3D elasticity, cases
> 1) 1st order elements, mesh(40,40,40)
> 2) 2nd order elements, mesh(20,20,20)
> 3) 3rd order elements, mesh(10,10,10)
> 4) 4th order elements, mesh(5,5,5)
> 
> # Results
>                 Tensor                      Quadrature
>   FFC    size  DOLFIN  run_t |  FFC    size  DOLFIN  run_q  run_q/run_t
> 1)  0.97s 279kb  3.19s  5.35s |  1.09s  267kb  3.18s  4.37s    1.09
> 2)  1.61s 700kb  6.30s  6.35s |  1.58s  478kb  4.59s  9.09s    1.63
> 3)  5.86s 2.3Mb 28.93s  2.89s |  5.03s  1.2Mb  10.08s  10.87s    2.64
> 4) 15.49s 6.8Mb  3m30s*  ---  | 11.49s  2.9Mb  31.57s  9.17s    ---
> 
> * I killed the compile after 3m30s but if it had finished I would expect
>   the tensor code to perform better than quadrature
> 
> Form: Plasticity
> 
> element  = VectorElement("Lagrange", "triangle", 1)
> elementT = VectorQuadratureElement("triangle", 1, 9)
> 
> v = TestFunction(element)
> u = TrialFunction(element)
> t = Function(elementT) # Consistent tangent coefficients
> 
> # Strain vector
> def epsilon(u):
>     return [u[0].dx(0), u[1].dx(1), u[0].dx(1) + u[1].dx(0)]
> 
> # Consistent tangent tensor
> def tangent(t):
>   return [[t[i*3 + j] for j in range(3)] for i in range(3)]
> 
> # Incremental stresses
> def dsigma(t, u):
>   return mult(tangent(t), epsilon(u))
> 
> a = dot(epsilon(v), dsigma(t, u) )*dx
> 
> 
> # 2D plasticity, cases
> 1) 1st order elements, mesh(1000, 1000)
> 2) 2nd order elements, mesh(500, 500)
> 3) 3rd order elements, mesh(250, 250)
> 4) 4th order elements*, mesh(125, 125)
> 
> *Note that because the bilinear form in this case is a 9th order form we take
> the number of quadrature points equal to 5 when declaring the quadrature
> element.
> 
> # Results
>               Tensor                                Quadrature
>     FFC    size    DOLFIN  run_t  |  FFC  size  DOLFIN    run_q  run_q/run_t
> 1)  0.65s  233kb  3.59s  6.57s  | 0.78s  230kb    3.52s    7.06s    1.07
> 2)  1.26s  497kb  6.24s  13.40s  | 1.36s  293kb    3.82s    12.70s    0.95
> 3)  4.26s  1.8Mb  46.74s  18.80s  | 2.25s  415kb    4.73s    14.15s    0.75
> 4) 26.15s  11.0Mb      *    ---  | 5.05s  687kb    6.68s    20.15s    ---
> 
> * I did not even try to compile this header file in DOLFIN
> 
> 
> # 3D plasticity, cases
> 1) 1st order elements, mesh(40,40,40)
> 2) 2nd order elements, mesh(20,20,20)
> 
> # Results
>         Tensor                          Quadrature
>   FFC    size  DOLFIN  run_t |  FFC    size  DOLFIN    run_q  run_q/run_t
> 1) 12.26s  778kb  8.81s  7.52s |  16.25s  673kb    7.92s    6.69s    0.89
> 2) 4m39s  11.0Mb  11m*    ---  |  1m17s  1.4Mb  12.44s  24.80s**  ---
> 
> * I ran out of memory after 11 minutes of compiling DOLFIN.
> ** For comparison, this assembly would take 768s with the previous quadrature
>   version, nearly a factor 30 speedup.
> 
> 
> Conclusions
> 
> I general quadrature representation results in less code and a shorter compile
> time for both FFC and DOLFIN. Complex forms (or forms with high element order)
> that cannot be compiled with tensor representation can possibly be compiled
> using the quadrature option. For forms with many coefficients, it even looks
> like
> quadrature is faster than tensor representation.
> 
> 
> For those who are interested, here's a rough sketch on how it works:
> 
> 
> P[ip][num_dofs]: tabulated value of basisfunctions (or derivatives)
> W[ip]:          weight at integration point
> J_**:            Jacobian terms (constant for each element)
> 
> Using 2nd order Lagrange tets for elasticity, some typical terms for the
> standard
> quadrature approach might look as follows:
> 
> // 8 integration points and 30 dofs
> for (unsigned int ip = 0; ip < 8; ip++)
>   for (unsigned int i = 0; i < 30; i++)
>     for (unsigned int j = 0; j < 30; j++)
>       A[j*30 + k] += P2[ip][j]*P0[ip][k]*W0[ip]*Jinv_00*Jinv_02*det\
>                   + P1[ip][j]*P0[ip][k]*W1[ip]*Jinv_00*Jinv_01*det
> 
> The number of operations to compute the above is roughly 8*30*30*12 = 86400
> 
> First we precompute the constant terms, much like the geometry terms for the
> tensor contraction (we do NOT tabulate coefficients w[.][.] ) Thus we have
> G0 = Jinv_00*Jinv_02*det
> G1 = Jinv_00*Jinv_01*det
> 
> Some of the tables with basisfunctions and quadrature are redundant, in this
> case let P2 = P1 and W0 = W1, which means that our loop reduces to:
> for (unsigned int ip = 0; ip < 8; ip++)
>   for (unsigned int i = 0; i < 30; i++)
>     for (unsigned int j = 0; j < 30; j++)
>       A[j*30 + k] += P1[ip][j]*P0[ip][k]*W0[ip]*(G0 + G1)
> 
> which is already a factor of 3 operations less operations.
> However, many of the tabulated values of the basis functions are zero. In the
> case of gradients of 2nd order Lagrange elements on a tetrahedron there is only
> 7 non-zero columns in each table. We can tabulate these columns at compile time
> e.g.,
> 
> nz0[7] = {0,1,2,3,4,5,6}
> nz1[7] = {10,11,12,13,14,15,16}
> 
> which allow us to reduce our loop even further:
> for (unsigned int ip = 0; ip < 8; ip++)
>   for (unsigned int i = 0; i < 7; i++)
>     for (unsigned int j = 0; j < 7; j++)
> A[ nz1[j]*30 + nz0[k] ] += P1[ip][ nz1[j] ]*P0[ip][ nz0[k] ]*W0[ip]*(G0 + G1)
> 
> The number of operations is now 8*7*7*4 = 1568, a factor of 55 lower compared
> to
> the initial value.
> Note: The actual elasticity form was reduced from 7,754,400 to 119,952
> operations.
> 
> _______________________________________________
> FFC-dev mailing list
> FFC-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/ffc-dev
> 
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> Instant message from any web browser! Try the new Yahoo! Canada Messenger for
> the Web BETA

> _______________________________________________
> FFC-dev mailing list
> FFC-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/ffc-dev
Follow ups

Re: quadrature optimisations
From: Kristian Oelgaard, 2008-04-29
References

Re: quadrature optimisations
From: Reza Farrahi Moghaddam, 2008-04-28