ffc team mailing list archive
-
ffc team
-
Mailing list archive
-
Message #01575
Re: quadrature optimisations
Hi Kristian,
Can you send me a complete example which uses quadrature representation (including main.cpp file)? There is a sample form-file with ffc code for nonlinear Poisson's equation (without Dolfin code).
Also, is it possible to use quadrature representation for test and trial functions too?
Reza
----- Original Message ----
From: Kristian Oelgaard <k.b.oelgaard@xxxxxxxxxx>
To: "ffc-dev@xxxxxxxxxx" <ffc-dev@xxxxxxxxxx>
Sent: Monday, April 28, 2008 2:25:53 PM
Subject: [FFC-dev] quadrature optimisations
Hi,
I've pushed some optimisations for quadrature representation to the repository,
there might be a mail notification on its way. Below are some benchmarks where I
compare the performance of tensor representation to that of quadrature
representation. Comparison is made between FFC compile times, size of generated
code, DOLFIN compile time, and runtime.
I'm happy with a release anytime now although there's a lot of cleaning up to do
in the quadrature code generation files but I might get around to that before
the next release given that this will be in May/June sometime.
Kristian
DOLFIN change-set 4105
FFC change-set 1119
The FFC and DOLFIN compile times were measured with the command
$ time
The run-time was measured as the time spent on assembly of the global stiffness
matrix once, i.e. time spent in the function assembleCells() in Assembler.cpp
# Form: Elasticity
element = VectorElement("Lagrange", "triangle", 1)
v = TestFunction(element)
u = TrialFunction(element)
E = 10.0
nu = 0.3
mu = E / (2*(1 + nu))
lmbda = E*nu / ((1 + nu)*(1 - 2*nu))
def epsilon(v):
return 0.5*(grad(v) + transp(grad(v)))
def sigma(v):
return 2*mu*epsilon(v) + lmbda*mult(trace(epsilon(v)), Identity(len(v)))
a = dot(grad(v), sigma(u))*dx
# 2D elasticity, cases
1) 1st order elements, mesh(1000, 1000)
2) 2nd order elements, mesh(500, 500)
3) 4th order elements, mesh(125, 125)
4) 6th order elements, mesh(64, 64)
# Results
Tensor Quadrature
FFC size DOLFIN run_t | FFC size DOLFIN run_q run_q/run_t
1) 0.39s 136kb 2.34s 5.35s | 0.44s 134kb 2.35s 5.66s 1.05
2) 0.49s 191kb 2.66s 6.35s | 0.52s 176kb 2.66s 8.31s 1.23
3) 0.99s 600kb 6.66s 2.89s | 0.95s 406kb 4.10s 5.81s 2.01
4) 3.81s 1.8Mb 32.10s 2.93s | 3.52s 1.1Mb 9.36s 9.27s 3.16
# 3D elasticity, cases
1) 1st order elements, mesh(40,40,40)
2) 2nd order elements, mesh(20,20,20)
3) 3rd order elements, mesh(10,10,10)
4) 4th order elements, mesh(5,5,5)
# Results
Tensor Quadrature
FFC size DOLFIN run_t | FFC size DOLFIN run_q run_q/run_t
1) 0.97s 279kb 3.19s 5.35s | 1.09s 267kb 3.18s 4.37s 1.09
2) 1.61s 700kb 6.30s 6.35s | 1.58s 478kb 4.59s 9.09s 1.63
3) 5.86s 2.3Mb 28.93s 2.89s | 5.03s 1.2Mb 10.08s 10.87s 2.64
4) 15.49s 6.8Mb 3m30s* --- | 11.49s 2.9Mb 31.57s 9.17s ---
* I killed the compile after 3m30s but if it had finished I would expect
the tensor code to perform better than quadrature
Form: Plasticity
element = VectorElement("Lagrange", "triangle", 1)
elementT = VectorQuadratureElement("triangle", 1, 9)
v = TestFunction(element)
u = TrialFunction(element)
t = Function(elementT) # Consistent tangent coefficients
# Strain vector
def epsilon(u):
return [u[0].dx(0), u[1].dx(1), u[0].dx(1) + u[1].dx(0)]
# Consistent tangent tensor
def tangent(t):
return [[t[i*3 + j] for j in range(3)] for i in range(3)]
# Incremental stresses
def dsigma(t, u):
return mult(tangent(t), epsilon(u))
a = dot(epsilon(v), dsigma(t, u) )*dx
# 2D plasticity, cases
1) 1st order elements, mesh(1000, 1000)
2) 2nd order elements, mesh(500, 500)
3) 3rd order elements, mesh(250, 250)
4) 4th order elements*, mesh(125, 125)
*Note that because the bilinear form in this case is a 9th order form we take
the number of quadrature points equal to 5 when declaring the quadrature element.
# Results
Tensor Quadrature
FFC size DOLFIN run_t | FFC size DOLFIN run_q run_q/run_t
1) 0.65s 233kb 3.59s 6.57s | 0.78s 230kb 3.52s 7.06s 1.07
2) 1.26s 497kb 6.24s 13.40s | 1.36s 293kb 3.82s 12.70s 0.95
3) 4.26s 1.8Mb 46.74s 18.80s | 2.25s 415kb 4.73s 14.15s 0.75
4) 26.15s 11.0Mb * --- | 5.05s 687kb 6.68s 20.15s ---
* I did not even try to compile this header file in DOLFIN
# 3D plasticity, cases
1) 1st order elements, mesh(40,40,40)
2) 2nd order elements, mesh(20,20,20)
# Results
Tensor Quadrature
FFC size DOLFIN run_t | FFC size DOLFIN run_q run_q/run_t
1) 12.26s 778kb 8.81s 7.52s | 16.25s 673kb 7.92s 6.69s 0.89
2) 4m39s 11.0Mb 11m* --- | 1m17s 1.4Mb 12.44s 24.80s** ---
* I ran out of memory after 11 minutes of compiling DOLFIN.
** For comparison, this assembly would take 768s with the previous quadrature
version, nearly a factor 30 speedup.
Conclusions
I general quadrature representation results in less code and a shorter compile
time for both FFC and DOLFIN. Complex forms (or forms with high element order)
that cannot be compiled with tensor representation can possibly be compiled
using the quadrature option. For forms with many coefficients, it even looks like
quadrature is faster than tensor representation.
For those who are interested, here's a rough sketch on how it works:
P[ip][num_dofs]: tabulated value of basisfunctions (or derivatives)
W[ip]: weight at integration point
J_**: Jacobian terms (constant for each element)
Using 2nd order Lagrange tets for elasticity, some typical terms for the standard
quadrature approach might look as follows:
// 8 integration points and 30 dofs
for (unsigned int ip = 0; ip < 8; ip++)
for (unsigned int i = 0; i < 30; i++)
for (unsigned int j = 0; j < 30; j++)
A[j*30 + k] += P2[ip][j]*P0[ip][k]*W0[ip]*Jinv_00*Jinv_02*det\
+ P1[ip][j]*P0[ip][k]*W1[ip]*Jinv_00*Jinv_01*det
The number of operations to compute the above is roughly 8*30*30*12 = 86400
First we precompute the constant terms, much like the geometry terms for the
tensor contraction (we do NOT tabulate coefficients w[.][.] ) Thus we have
G0 = Jinv_00*Jinv_02*det
G1 = Jinv_00*Jinv_01*det
Some of the tables with basisfunctions and quadrature are redundant, in this
case let P2 = P1 and W0 = W1, which means that our loop reduces to:
for (unsigned int ip = 0; ip < 8; ip++)
for (unsigned int i = 0; i < 30; i++)
for (unsigned int j = 0; j < 30; j++)
A[j*30 + k] += P1[ip][j]*P0[ip][k]*W0[ip]*(G0 + G1)
which is already a factor of 3 operations less operations.
However, many of the tabulated values of the basis functions are zero. In the
case of gradients of 2nd order Lagrange elements on a tetrahedron there is only
7 non-zero columns in each table. We can tabulate these columns at compile time
e.g.,
nz0[7] = {0,1,2,3,4,5,6}
nz1[7] = {10,11,12,13,14,15,16}
which allow us to reduce our loop even further:
for (unsigned int ip = 0; ip < 8; ip++)
for (unsigned int i = 0; i < 7; i++)
for (unsigned int j = 0; j < 7; j++)
A[ nz1[j]*30 + nz0[k] ] += P1[ip][ nz1[j] ]*P0[ip][ nz0[k] ]*W0[ip]*(G0 + G1)
The number of operations is now 8*7*7*4 = 1568, a factor of 55 lower compared to
the initial value.
Note: The actual elasticity form was reduced from 7,754,400 to 119,952
operations.
_______________________________________________
FFC-dev mailing list
FFC-dev@xxxxxxxxxx
http://www.fenics.org/mailman/listinfo/ffc-dev
__________________________________________________________________
Ask a question on any topic and get answers from real people. Go to Yahoo! Answers and share what you know at http://ca.answers.yahoo.com
Follow ups