ffc team mailing list archive
-
ffc team
-
Mailing list archive
-
Message #01583
Re: quadrature optimisations
Quoting Anders Logg <logg@xxxxxxxxx>:
> On Mon, Apr 28, 2008 at 02:16:24PM -0700, Reza Farrahi Moghaddam wrote:
> > Hi Kristian,
> >
> > Can you send me a complete example which uses quadrature representation
> > (including main.cpp file)? There is a sample form-file with ffc code for
> > nonlinear Poisson's equation (without Dolfin code).
> > Also, is it possible to use quadrature representation for test and trial
> > functions too?
> >
> > Reza
>
> You just need to add -r quadrature when compiling.
Like the following:
ffc -l dolfin -r quadrature Elasticity.form
Quadrature representation only changes the way the local element tensor is
being computed in the function tabulate_tensor() in the generated header file.
Everything else remains the same.
Kristian
> --
> Anders
>
>
> >
> >
> > ----- Original Message ----
> > From: Kristian Oelgaard <k.b.oelgaard@xxxxxxxxxx>
> > To: "ffc-dev@xxxxxxxxxx" <ffc-dev@xxxxxxxxxx>
> > Sent: Monday, April 28, 2008 2:25:53 PM
> > Subject: [FFC-dev] quadrature optimisations
> >
> >
> >
> > Hi,
> >
> > I've pushed some optimisations for quadrature representation to the
> repository,
> > there might be a mail notification on its way. Below are some benchmarks
> where
> > I
> > compare the performance of tensor representation to that of quadrature
> > representation. Comparison is made between FFC compile times, size of
> generated
> > code, DOLFIN compile time, and runtime.
> >
> > I'm happy with a release anytime now although there's a lot of cleaning up
> to
> > do
> > in the quadrature code generation files but I might get around to that
> before
> > the next release given that this will be in May/June sometime.
> >
> > Kristian
> >
> >
> > DOLFIN change-set 4105
> > FFC change-set 1119
> >
> > The FFC and DOLFIN compile times were measured with the command
> > $ time
> >
> > The run-time was measured as the time spent on assembly of the global
> stiffness
> > matrix once, i.e. time spent in the function assembleCells() in
> Assembler.cpp
> >
> >
> > # Form: Elasticity
> > element = VectorElement("Lagrange", "triangle", 1)
> >
> > v = TestFunction(element)
> > u = TrialFunction(element)
> >
> > E = 10.0
> > nu = 0.3
> >
> > mu = E / (2*(1 + nu))
> > lmbda = E*nu / ((1 + nu)*(1 - 2*nu))
> >
> > def epsilon(v):
> > return 0.5*(grad(v) + transp(grad(v)))
> >
> > def sigma(v):
> > return 2*mu*epsilon(v) + lmbda*mult(trace(epsilon(v)),
> Identity(len(v)))
> >
> > a = dot(grad(v), sigma(u))*dx
> >
> > # 2D elasticity, cases
> > 1) 1st order elements, mesh(1000, 1000)
> > 2) 2nd order elements, mesh(500, 500)
> > 3) 4th order elements, mesh(125, 125)
> > 4) 6th order elements, mesh(64, 64)
> >
> > # Results
> > Tensor Quadrature
> > FFC size DOLFIN run_t | FFC size DOLFIN run_q run_q/run_t
> > 1) 0.39s 136kb 2.34s 5.35s | 0.44s 134kb 2.35s 5.66s 1.05
> > 2) 0.49s 191kb 2.66s 6.35s | 0.52s 176kb 2.66s 8.31s 1.23
> > 3) 0.99s 600kb 6.66s 2.89s | 0.95s 406kb 4.10s 5.81s 2.01
> > 4) 3.81s 1.8Mb 32.10s 2.93s | 3.52s 1.1Mb 9.36s 9.27s 3.16
> >
> >
> > # 3D elasticity, cases
> > 1) 1st order elements, mesh(40,40,40)
> > 2) 2nd order elements, mesh(20,20,20)
> > 3) 3rd order elements, mesh(10,10,10)
> > 4) 4th order elements, mesh(5,5,5)
> >
> > # Results
> > Tensor Quadrature
> > FFC size DOLFIN run_t | FFC size DOLFIN run_q run_q/run_t
> > 1) 0.97s 279kb 3.19s 5.35s | 1.09s 267kb 3.18s 4.37s 1.09
> > 2) 1.61s 700kb 6.30s 6.35s | 1.58s 478kb 4.59s 9.09s 1.63
> > 3) 5.86s 2.3Mb 28.93s 2.89s | 5.03s 1.2Mb 10.08s 10.87s 2.64
> > 4) 15.49s 6.8Mb 3m30s* --- | 11.49s 2.9Mb 31.57s 9.17s ---
> >
> > * I killed the compile after 3m30s but if it had finished I would expect
> > the tensor code to perform better than quadrature
> >
> > Form: Plasticity
> >
> > element = VectorElement("Lagrange", "triangle", 1)
> > elementT = VectorQuadratureElement("triangle", 1, 9)
> >
> > v = TestFunction(element)
> > u = TrialFunction(element)
> > t = Function(elementT) # Consistent tangent coefficients
> >
> > # Strain vector
> > def epsilon(u):
> > return [u[0].dx(0), u[1].dx(1), u[0].dx(1) + u[1].dx(0)]
> >
> > # Consistent tangent tensor
> > def tangent(t):
> > return [[t[i*3 + j] for j in range(3)] for i in range(3)]
> >
> > # Incremental stresses
> > def dsigma(t, u):
> > return mult(tangent(t), epsilon(u))
> >
> > a = dot(epsilon(v), dsigma(t, u) )*dx
> >
> >
> > # 2D plasticity, cases
> > 1) 1st order elements, mesh(1000, 1000)
> > 2) 2nd order elements, mesh(500, 500)
> > 3) 3rd order elements, mesh(250, 250)
> > 4) 4th order elements*, mesh(125, 125)
> >
> > *Note that because the bilinear form in this case is a 9th order form we
> take
> > the number of quadrature points equal to 5 when declaring the quadrature
> > element.
> >
> > # Results
> > Tensor Quadrature
> > FFC size DOLFIN run_t | FFC size DOLFIN run_q
> run_q/run_t
> > 1) 0.65s 233kb 3.59s 6.57s | 0.78s 230kb 3.52s 7.06s 1.07
> > 2) 1.26s 497kb 6.24s 13.40s | 1.36s 293kb 3.82s 12.70s 0.95
> > 3) 4.26s 1.8Mb 46.74s 18.80s | 2.25s 415kb 4.73s 14.15s
> 0.75
> > 4) 26.15s 11.0Mb * --- | 5.05s 687kb 6.68s 20.15s ---
> >
> > * I did not even try to compile this header file in DOLFIN
> >
> >
> > # 3D plasticity, cases
> > 1) 1st order elements, mesh(40,40,40)
> > 2) 2nd order elements, mesh(20,20,20)
> >
> > # Results
> > Tensor Quadrature
> > FFC size DOLFIN run_t | FFC size DOLFIN run_q run_q/run_t
> > 1) 12.26s 778kb 8.81s 7.52s | 16.25s 673kb 7.92s 6.69s 0.89
> > 2) 4m39s 11.0Mb 11m* --- | 1m17s 1.4Mb 12.44s 24.80s** ---
> >
> > * I ran out of memory after 11 minutes of compiling DOLFIN.
> > ** For comparison, this assembly would take 768s with the previous
> quadrature
> > version, nearly a factor 30 speedup.
> >
> >
> > Conclusions
> >
> > I general quadrature representation results in less code and a shorter
> compile
> > time for both FFC and DOLFIN. Complex forms (or forms with high element
> order)
> > that cannot be compiled with tensor representation can possibly be
> compiled
> > using the quadrature option. For forms with many coefficients, it even
> looks
> > like
> > quadrature is faster than tensor representation.
> >
> >
> > For those who are interested, here's a rough sketch on how it works:
> >
> >
> > P[ip][num_dofs]: tabulated value of basisfunctions (or derivatives)
> > W[ip]: weight at integration point
> > J_**: Jacobian terms (constant for each element)
> >
> > Using 2nd order Lagrange tets for elasticity, some typical terms for the
> > standard
> > quadrature approach might look as follows:
> >
> > // 8 integration points and 30 dofs
> > for (unsigned int ip = 0; ip < 8; ip++)
> > for (unsigned int i = 0; i < 30; i++)
> > for (unsigned int j = 0; j < 30; j++)
> > A[j*30 + k] += P2[ip][j]*P0[ip][k]*W0[ip]*Jinv_00*Jinv_02*det\
> > + P1[ip][j]*P0[ip][k]*W1[ip]*Jinv_00*Jinv_01*det
> >
> > The number of operations to compute the above is roughly 8*30*30*12 =
> 86400
> >
> > First we precompute the constant terms, much like the geometry terms for
> the
> > tensor contraction (we do NOT tabulate coefficients w[.][.] ) Thus we have
> > G0 = Jinv_00*Jinv_02*det
> > G1 = Jinv_00*Jinv_01*det
> >
> > Some of the tables with basisfunctions and quadrature are redundant, in
> this
> > case let P2 = P1 and W0 = W1, which means that our loop reduces to:
> > for (unsigned int ip = 0; ip < 8; ip++)
> > for (unsigned int i = 0; i < 30; i++)
> > for (unsigned int j = 0; j < 30; j++)
> > A[j*30 + k] += P1[ip][j]*P0[ip][k]*W0[ip]*(G0 + G1)
> >
> > which is already a factor of 3 operations less operations.
> > However, many of the tabulated values of the basis functions are zero. In
> the
> > case of gradients of 2nd order Lagrange elements on a tetrahedron there is
> only
> > 7 non-zero columns in each table. We can tabulate these columns at compile
> time
> > e.g.,
> >
> > nz0[7] = {0,1,2,3,4,5,6}
> > nz1[7] = {10,11,12,13,14,15,16}
> >
> > which allow us to reduce our loop even further:
> > for (unsigned int ip = 0; ip < 8; ip++)
> > for (unsigned int i = 0; i < 7; i++)
> > for (unsigned int j = 0; j < 7; j++)
> > A[ nz1[j]*30 + nz0[k] ] += P1[ip][ nz1[j] ]*P0[ip][ nz0[k] ]*W0[ip]*(G0 +
> G1)
> >
> > The number of operations is now 8*7*7*4 = 1568, a factor of 55 lower
> compared
> > to
> > the initial value.
> > Note: The actual elasticity form was reduced from 7,754,400 to 119,952
> > operations.
> >
> > _______________________________________________
> > FFC-dev mailing list
> > FFC-dev@xxxxxxxxxx
> > http://www.fenics.org/mailman/listinfo/ffc-dev
> >
> >
> âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
> > Instant message from any web browser! Try the new Yahoo! Canada Messenger
> for
> > the Web BETA
>
> > _______________________________________________
> > FFC-dev mailing list
> > FFC-dev@xxxxxxxxxx
> > http://www.fenics.org/mailman/listinfo/ffc-dev
>
> _______________________________________________
> FFC-dev mailing list
> FFC-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/ffc-dev
>
References