ffc team mailing list archive
-
ffc team
-
Mailing list archive
-
Message #01577
Re: quadrature optimisations
On Mon, Apr 28, 2008 at 02:16:24PM -0700, Reza Farrahi Moghaddam wrote:
> Hi Kristian,
>
> Can you send me a complete example which uses quadrature representation
> (including main.cpp file)? There is a sample form-file with ffc code for
> nonlinear Poisson's equation (without Dolfin code).
> Also, is it possible to use quadrature representation for test and trial
> functions too?
>
> Reza
You just need to add -r quadrature when compiling.
--
Anders
>
>
> ----- Original Message ----
> From: Kristian Oelgaard <k.b.oelgaard@xxxxxxxxxx>
> To: "ffc-dev@xxxxxxxxxx" <ffc-dev@xxxxxxxxxx>
> Sent: Monday, April 28, 2008 2:25:53 PM
> Subject: [FFC-dev] quadrature optimisations
>
>
>
> Hi,
>
> I've pushed some optimisations for quadrature representation to the repository,
> there might be a mail notification on its way. Below are some benchmarks where
> I
> compare the performance of tensor representation to that of quadrature
> representation. Comparison is made between FFC compile times, size of generated
> code, DOLFIN compile time, and runtime.
>
> I'm happy with a release anytime now although there's a lot of cleaning up to
> do
> in the quadrature code generation files but I might get around to that before
> the next release given that this will be in May/June sometime.
>
> Kristian
>
>
> DOLFIN change-set 4105
> FFC change-set 1119
>
> The FFC and DOLFIN compile times were measured with the command
> $ time
>
> The run-time was measured as the time spent on assembly of the global stiffness
> matrix once, i.e. time spent in the function assembleCells() in Assembler.cpp
>
>
> # Form: Elasticity
> element = VectorElement("Lagrange", "triangle", 1)
>
> v = TestFunction(element)
> u = TrialFunction(element)
>
> E = 10.0
> nu = 0.3
>
> mu = E / (2*(1 + nu))
> lmbda = E*nu / ((1 + nu)*(1 - 2*nu))
>
> def epsilon(v):
> return 0.5*(grad(v) + transp(grad(v)))
>
> def sigma(v):
> return 2*mu*epsilon(v) + lmbda*mult(trace(epsilon(v)), Identity(len(v)))
>
> a = dot(grad(v), sigma(u))*dx
>
> # 2D elasticity, cases
> 1) 1st order elements, mesh(1000, 1000)
> 2) 2nd order elements, mesh(500, 500)
> 3) 4th order elements, mesh(125, 125)
> 4) 6th order elements, mesh(64, 64)
>
> # Results
> Tensor Quadrature
> FFC size DOLFIN run_t | FFC size DOLFIN run_q run_q/run_t
> 1) 0.39s 136kb 2.34s 5.35s | 0.44s 134kb 2.35s 5.66s 1.05
> 2) 0.49s 191kb 2.66s 6.35s | 0.52s 176kb 2.66s 8.31s 1.23
> 3) 0.99s 600kb 6.66s 2.89s | 0.95s 406kb 4.10s 5.81s 2.01
> 4) 3.81s 1.8Mb 32.10s 2.93s | 3.52s 1.1Mb 9.36s 9.27s 3.16
>
>
> # 3D elasticity, cases
> 1) 1st order elements, mesh(40,40,40)
> 2) 2nd order elements, mesh(20,20,20)
> 3) 3rd order elements, mesh(10,10,10)
> 4) 4th order elements, mesh(5,5,5)
>
> # Results
> Tensor Quadrature
> FFC size DOLFIN run_t | FFC size DOLFIN run_q run_q/run_t
> 1) 0.97s 279kb 3.19s 5.35s | 1.09s 267kb 3.18s 4.37s 1.09
> 2) 1.61s 700kb 6.30s 6.35s | 1.58s 478kb 4.59s 9.09s 1.63
> 3) 5.86s 2.3Mb 28.93s 2.89s | 5.03s 1.2Mb 10.08s 10.87s 2.64
> 4) 15.49s 6.8Mb 3m30s* --- | 11.49s 2.9Mb 31.57s 9.17s ---
>
> * I killed the compile after 3m30s but if it had finished I would expect
> the tensor code to perform better than quadrature
>
> Form: Plasticity
>
> element = VectorElement("Lagrange", "triangle", 1)
> elementT = VectorQuadratureElement("triangle", 1, 9)
>
> v = TestFunction(element)
> u = TrialFunction(element)
> t = Function(elementT) # Consistent tangent coefficients
>
> # Strain vector
> def epsilon(u):
> return [u[0].dx(0), u[1].dx(1), u[0].dx(1) + u[1].dx(0)]
>
> # Consistent tangent tensor
> def tangent(t):
> return [[t[i*3 + j] for j in range(3)] for i in range(3)]
>
> # Incremental stresses
> def dsigma(t, u):
> return mult(tangent(t), epsilon(u))
>
> a = dot(epsilon(v), dsigma(t, u) )*dx
>
>
> # 2D plasticity, cases
> 1) 1st order elements, mesh(1000, 1000)
> 2) 2nd order elements, mesh(500, 500)
> 3) 3rd order elements, mesh(250, 250)
> 4) 4th order elements*, mesh(125, 125)
>
> *Note that because the bilinear form in this case is a 9th order form we take
> the number of quadrature points equal to 5 when declaring the quadrature
> element.
>
> # Results
> Tensor Quadrature
> FFC size DOLFIN run_t | FFC size DOLFIN run_q run_q/run_t
> 1) 0.65s 233kb 3.59s 6.57s | 0.78s 230kb 3.52s 7.06s 1.07
> 2) 1.26s 497kb 6.24s 13.40s | 1.36s 293kb 3.82s 12.70s 0.95
> 3) 4.26s 1.8Mb 46.74s 18.80s | 2.25s 415kb 4.73s 14.15s 0.75
> 4) 26.15s 11.0Mb * --- | 5.05s 687kb 6.68s 20.15s ---
>
> * I did not even try to compile this header file in DOLFIN
>
>
> # 3D plasticity, cases
> 1) 1st order elements, mesh(40,40,40)
> 2) 2nd order elements, mesh(20,20,20)
>
> # Results
> Tensor Quadrature
> FFC size DOLFIN run_t | FFC size DOLFIN run_q run_q/run_t
> 1) 12.26s 778kb 8.81s 7.52s | 16.25s 673kb 7.92s 6.69s 0.89
> 2) 4m39s 11.0Mb 11m* --- | 1m17s 1.4Mb 12.44s 24.80s** ---
>
> * I ran out of memory after 11 minutes of compiling DOLFIN.
> ** For comparison, this assembly would take 768s with the previous quadrature
> version, nearly a factor 30 speedup.
>
>
> Conclusions
>
> I general quadrature representation results in less code and a shorter compile
> time for both FFC and DOLFIN. Complex forms (or forms with high element order)
> that cannot be compiled with tensor representation can possibly be compiled
> using the quadrature option. For forms with many coefficients, it even looks
> like
> quadrature is faster than tensor representation.
>
>
> For those who are interested, here's a rough sketch on how it works:
>
>
> P[ip][num_dofs]: tabulated value of basisfunctions (or derivatives)
> W[ip]: weight at integration point
> J_**: Jacobian terms (constant for each element)
>
> Using 2nd order Lagrange tets for elasticity, some typical terms for the
> standard
> quadrature approach might look as follows:
>
> // 8 integration points and 30 dofs
> for (unsigned int ip = 0; ip < 8; ip++)
> for (unsigned int i = 0; i < 30; i++)
> for (unsigned int j = 0; j < 30; j++)
> A[j*30 + k] += P2[ip][j]*P0[ip][k]*W0[ip]*Jinv_00*Jinv_02*det\
> + P1[ip][j]*P0[ip][k]*W1[ip]*Jinv_00*Jinv_01*det
>
> The number of operations to compute the above is roughly 8*30*30*12 = 86400
>
> First we precompute the constant terms, much like the geometry terms for the
> tensor contraction (we do NOT tabulate coefficients w[.][.] ) Thus we have
> G0 = Jinv_00*Jinv_02*det
> G1 = Jinv_00*Jinv_01*det
>
> Some of the tables with basisfunctions and quadrature are redundant, in this
> case let P2 = P1 and W0 = W1, which means that our loop reduces to:
> for (unsigned int ip = 0; ip < 8; ip++)
> for (unsigned int i = 0; i < 30; i++)
> for (unsigned int j = 0; j < 30; j++)
> A[j*30 + k] += P1[ip][j]*P0[ip][k]*W0[ip]*(G0 + G1)
>
> which is already a factor of 3 operations less operations.
> However, many of the tabulated values of the basis functions are zero. In the
> case of gradients of 2nd order Lagrange elements on a tetrahedron there is only
> 7 non-zero columns in each table. We can tabulate these columns at compile time
> e.g.,
>
> nz0[7] = {0,1,2,3,4,5,6}
> nz1[7] = {10,11,12,13,14,15,16}
>
> which allow us to reduce our loop even further:
> for (unsigned int ip = 0; ip < 8; ip++)
> for (unsigned int i = 0; i < 7; i++)
> for (unsigned int j = 0; j < 7; j++)
> A[ nz1[j]*30 + nz0[k] ] += P1[ip][ nz1[j] ]*P0[ip][ nz0[k] ]*W0[ip]*(G0 + G1)
>
> The number of operations is now 8*7*7*4 = 1568, a factor of 55 lower compared
> to
> the initial value.
> Note: The actual elasticity form was reduced from 7,754,400 to 119,952
> operations.
>
> _______________________________________________
> FFC-dev mailing list
> FFC-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/ffc-dev
>
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> Instant message from any web browser! Try the new Yahoo! Canada Messenger for
> the Web BETA
> _______________________________________________
> FFC-dev mailing list
> FFC-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/ffc-dev
Follow ups
References