ffc team mailing list archive

Thread
Date
Re: quadrature optimisations

To: ffc-dev@xxxxxxxxxx
From: Kristian Oelgaard <k.b.oelgaard@xxxxxxxxxx>
Date: Tue, 29 Apr 2008 09:14:06 +0200
Delivered-to: ffc-dev@xxxxxxxxxx
In-reply-to: <20080428213305.GO3367@simula.no>
User-agent: Internet Messaging Program (IMP) 3.1
Quoting Anders Logg <logg@xxxxxxxxx>:

> On Mon, Apr 28, 2008 at 02:16:24PM -0700, Reza Farrahi Moghaddam wrote:
> > Hi Kristian,
> >  
> > Can you send me a complete example which uses quadrature representation
> > (including main.cpp file)? There is a sample form-file with ffc code for
> > nonlinear Poisson's equation (without Dolfin code).
> > Also, is it possible to use quadrature representation for test and trial
> > functions too? 
> >  
> > Reza
> 
> You just need to add -r quadrature when compiling.

Like the following:
ffc -l dolfin -r quadrature Elasticity.form

Quadrature representation only changes the way the local element tensor is
being computed in the function tabulate_tensor() in the generated header file.
Everything else remains the same.

Kristian
 
> -- 
> Anders
> 
> 
> > 
> >  
> > ----- Original Message ----
> > From: Kristian Oelgaard <k.b.oelgaard@xxxxxxxxxx>
> > To: "ffc-dev@xxxxxxxxxx" <ffc-dev@xxxxxxxxxx>
> > Sent: Monday, April 28, 2008 2:25:53 PM
> > Subject: [FFC-dev] quadrature optimisations
> > 
> > 
> > 
> > Hi,
> > 
> > I've pushed some optimisations for quadrature representation to the
> repository,
> > there might be a mail notification on its way. Below are some benchmarks
> where
> > I
> > compare the performance of tensor representation to that of quadrature
> > representation. Comparison is made between FFC compile times, size of
> generated
> > code, DOLFIN compile time, and runtime.
> > 
> > I'm happy with a release anytime now although there's a lot of cleaning up
> to
> > do
> > in the quadrature code generation files but I might get around to that
> before
> > the next release given that this will be in May/June sometime.
> > 
> > Kristian
> > 
> > 
> > DOLFIN change-set 4105
> > FFC    change-set 1119
> > 
> > The FFC and DOLFIN compile times were measured with the command
> > $ time
> > 
> > The run-time was measured as the time spent on assembly of the global
> stiffness
> > matrix once, i.e. time spent in the function assembleCells() in
> Assembler.cpp
> > 
> > 
> > # Form: Elasticity
> > element = VectorElement("Lagrange", "triangle", 1)
> > 
> > v = TestFunction(element)
> > u = TrialFunction(element)
> > 
> > E  = 10.0
> > nu = 0.3
> > 
> > mu    = E / (2*(1 + nu))
> > lmbda = E*nu / ((1 + nu)*(1 - 2*nu))
> > 
> > def epsilon(v):
> >     return 0.5*(grad(v) + transp(grad(v)))
> > 
> > def sigma(v):
> >     return 2*mu*epsilon(v) + lmbda*mult(trace(epsilon(v)),
> Identity(len(v)))
> > 
> > a = dot(grad(v), sigma(u))*dx
> > 
> > # 2D elasticity, cases
> > 1) 1st order elements, mesh(1000, 1000)
> > 2) 2nd order elements, mesh(500, 500)
> > 3) 4th order elements, mesh(125, 125)
> > 4) 6th order elements, mesh(64, 64)
> > 
> > # Results
> >               Tensor                        Quadrature
> >   FFC    size  DOLFIN  run_t |  FFC  size    DOLFIN    run_q  run_q/run_t
> > 1) 0.39s  136kb  2.34s  5.35s |  0.44s  134kb    2.35s    5.66s    1.05
> > 2) 0.49s  191kb  2.66s  6.35s |  0.52s  176kb    2.66s    8.31s    1.23
> > 3) 0.99s  600kb  6.66s  2.89s |  0.95s  406kb    4.10s    5.81s    2.01
> > 4) 3.81s  1.8Mb  32.10s  2.93s |  3.52s  1.1Mb    9.36s    9.27s    3.16
> > 
> > 
> > # 3D elasticity, cases
> > 1) 1st order elements, mesh(40,40,40)
> > 2) 2nd order elements, mesh(20,20,20)
> > 3) 3rd order elements, mesh(10,10,10)
> > 4) 4th order elements, mesh(5,5,5)
> > 
> > # Results
> >                 Tensor                      Quadrature
> >   FFC    size  DOLFIN  run_t |  FFC    size  DOLFIN  run_q  run_q/run_t
> > 1)  0.97s 279kb  3.19s  5.35s |  1.09s  267kb  3.18s  4.37s    1.09
> > 2)  1.61s 700kb  6.30s  6.35s |  1.58s  478kb  4.59s  9.09s    1.63
> > 3)  5.86s 2.3Mb 28.93s  2.89s |  5.03s  1.2Mb  10.08s  10.87s    2.64
> > 4) 15.49s 6.8Mb  3m30s*  ---  | 11.49s  2.9Mb  31.57s  9.17s    ---
> > 
> > * I killed the compile after 3m30s but if it had finished I would expect
> >   the tensor code to perform better than quadrature
> > 
> > Form: Plasticity
> > 
> > element  = VectorElement("Lagrange", "triangle", 1)
> > elementT = VectorQuadratureElement("triangle", 1, 9)
> > 
> > v = TestFunction(element)
> > u = TrialFunction(element)
> > t = Function(elementT) # Consistent tangent coefficients
> > 
> > # Strain vector
> > def epsilon(u):
> >     return [u[0].dx(0), u[1].dx(1), u[0].dx(1) + u[1].dx(0)]
> > 
> > # Consistent tangent tensor
> > def tangent(t):
> >   return [[t[i*3 + j] for j in range(3)] for i in range(3)]
> > 
> > # Incremental stresses
> > def dsigma(t, u):
> >   return mult(tangent(t), epsilon(u))
> > 
> > a = dot(epsilon(v), dsigma(t, u) )*dx
> > 
> > 
> > # 2D plasticity, cases
> > 1) 1st order elements, mesh(1000, 1000)
> > 2) 2nd order elements, mesh(500, 500)
> > 3) 3rd order elements, mesh(250, 250)
> > 4) 4th order elements*, mesh(125, 125)
> > 
> > *Note that because the bilinear form in this case is a 9th order form we
> take
> > the number of quadrature points equal to 5 when declaring the quadrature
> > element.
> > 
> > # Results
> >               Tensor                                Quadrature
> >     FFC    size    DOLFIN  run_t  |  FFC  size  DOLFIN    run_q 
> run_q/run_t
> > 1)  0.65s  233kb  3.59s  6.57s  | 0.78s  230kb    3.52s    7.06s    1.07
> > 2)  1.26s  497kb  6.24s  13.40s  | 1.36s  293kb    3.82s    12.70s    0.95
> > 3)  4.26s  1.8Mb  46.74s  18.80s  | 2.25s  415kb    4.73s    14.15s   
> 0.75
> > 4) 26.15s  11.0Mb      *    ---  | 5.05s  687kb    6.68s    20.15s    ---
> > 
> > * I did not even try to compile this header file in DOLFIN
> > 
> > 
> > # 3D plasticity, cases
> > 1) 1st order elements, mesh(40,40,40)
> > 2) 2nd order elements, mesh(20,20,20)
> > 
> > # Results
> >         Tensor                          Quadrature
> >   FFC    size  DOLFIN  run_t |  FFC    size  DOLFIN    run_q  run_q/run_t
> > 1) 12.26s  778kb  8.81s  7.52s |  16.25s  673kb    7.92s    6.69s    0.89
> > 2) 4m39s  11.0Mb  11m*    ---  |  1m17s  1.4Mb  12.44s  24.80s**  ---
> > 
> > * I ran out of memory after 11 minutes of compiling DOLFIN.
> > ** For comparison, this assembly would take 768s with the previous
> quadrature
> >   version, nearly a factor 30 speedup.
> > 
> > 
> > Conclusions
> > 
> > I general quadrature representation results in less code and a shorter
> compile
> > time for both FFC and DOLFIN. Complex forms (or forms with high element
> order)
> > that cannot be compiled with tensor representation can possibly be
> compiled
> > using the quadrature option. For forms with many coefficients, it even
> looks
> > like
> > quadrature is faster than tensor representation.
> > 
> > 
> > For those who are interested, here's a rough sketch on how it works:
> > 
> > 
> > P[ip][num_dofs]: tabulated value of basisfunctions (or derivatives)
> > W[ip]:          weight at integration point
> > J_**:            Jacobian terms (constant for each element)
> > 
> > Using 2nd order Lagrange tets for elasticity, some typical terms for the
> > standard
> > quadrature approach might look as follows:
> > 
> > // 8 integration points and 30 dofs
> > for (unsigned int ip = 0; ip < 8; ip++)
> >   for (unsigned int i = 0; i < 30; i++)
> >     for (unsigned int j = 0; j < 30; j++)
> >       A[j*30 + k] += P2[ip][j]*P0[ip][k]*W0[ip]*Jinv_00*Jinv_02*det\
> >                   + P1[ip][j]*P0[ip][k]*W1[ip]*Jinv_00*Jinv_01*det
> > 
> > The number of operations to compute the above is roughly 8*30*30*12 =
> 86400
> > 
> > First we precompute the constant terms, much like the geometry terms for
> the
> > tensor contraction (we do NOT tabulate coefficients w[.][.] ) Thus we have
> > G0 = Jinv_00*Jinv_02*det
> > G1 = Jinv_00*Jinv_01*det
> > 
> > Some of the tables with basisfunctions and quadrature are redundant, in
> this
> > case let P2 = P1 and W0 = W1, which means that our loop reduces to:
> > for (unsigned int ip = 0; ip < 8; ip++)
> >   for (unsigned int i = 0; i < 30; i++)
> >     for (unsigned int j = 0; j < 30; j++)
> >       A[j*30 + k] += P1[ip][j]*P0[ip][k]*W0[ip]*(G0 + G1)
> > 
> > which is already a factor of 3 operations less operations.
> > However, many of the tabulated values of the basis functions are zero. In
> the
> > case of gradients of 2nd order Lagrange elements on a tetrahedron there is
> only
> > 7 non-zero columns in each table. We can tabulate these columns at compile
> time
> > e.g.,
> > 
> > nz0[7] = {0,1,2,3,4,5,6}
> > nz1[7] = {10,11,12,13,14,15,16}
> > 
> > which allow us to reduce our loop even further:
> > for (unsigned int ip = 0; ip < 8; ip++)
> >   for (unsigned int i = 0; i < 7; i++)
> >     for (unsigned int j = 0; j < 7; j++)
> > A[ nz1[j]*30 + nz0[k] ] += P1[ip][ nz1[j] ]*P0[ip][ nz0[k] ]*W0[ip]*(G0 +
> G1)
> > 
> > The number of operations is now 8*7*7*4 = 1568, a factor of 55 lower
> compared
> > to
> > the initial value.
> > Note: The actual elasticity form was reduced from 7,754,400 to 119,952
> > operations.
> > 
> > _______________________________________________
> > FFC-dev mailing list
> > FFC-dev@xxxxxxxxxx
> > http://www.fenics.org/mailman/listinfo/ffc-dev
> > 
> >
> â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”
> > Instant message from any web browser! Try the new Yahoo! Canada Messenger
> for
> > the Web BETA
> 
> > _______________________________________________
> > FFC-dev mailing list
> > FFC-dev@xxxxxxxxxx
> > http://www.fenics.org/mailman/listinfo/ffc-dev
> 
> _______________________________________________
> FFC-dev mailing list
> FFC-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/ffc-dev
>
References

Re: quadrature optimisations
From: Reza Farrahi Moghaddam, 2008-04-28
Re: quadrature optimisations
From: Anders Logg, 2008-04-28