← Back to team overview

dolfin team mailing list archive

Re: profiling an assembly

 



Anders Logg wrote:
On Fri, May 16, 2008 at 12:17:19AM +0200, Murtazo Nazarov wrote:
Hello!

I'm looking at a "suspiciously slow" assembly and would like to
determine what is going on. In general, what should one expect the most
time-consuming step to be?

This is what my gprof looks like:

Time:
61.97%  unsigned int const* std::lower_bound
25.84%  dolfin::uBlasMatrix<...>::add
8.27%   UFC_NSEMomentum3DBilinearForm_cell_integral_0::tabulate_tensor
1.1%    dolfin::uBlasMatrix<...>::init

Where is lower_bound used? From within uBlasMatrix::add or is it in
building the sparsity pattern?


I suspect that it's either in building the sparsity pattern or initialising the uBLAS matrix. The matrix structure is initialised by running across rows and inserting a zero. uBLAS doesn't provide a mechanism for initialising the underlying data structures directly for a sparse matrix.

Dag: could you you run the same test using PETSc as the backend?

I got these numbers also. I understand that it is very painful in large
computations.

I see what is a problem with adding into the stiffness matrix A. Searching
the position of the element which needs to be added takes very long time,
especially if you are solving big problems with thousands unknowns and
repeating the assembling a lot of times!

If you know a good way to avoid inserting entries into a sparse matrix
during assembly, please tell me... :-)

If the assembly is costly, you might want to try assembling the action
of it instead and send that to a Krylov solver. Inserting into a
vector is much easier than into a sparse matrix.

One way could be finding the global indices of the matrix A once, and use
it in the assembly process. By this way we avoid of searching the element
position and it makes the process significantly fast. But, there is a
problem: somehow I cannot get access to the global index of cell in the A
and change it instead of using MatSetValues (in PETSc).

I don't understand what you suggest here. We do precompute the
sparsity pattern of the matrix and use that to preallocate, but I
don't know of any other way to insert entries than MatSetValues.


I doubt insertion is the real problem, especially as Dag noted that subsequent assembly operations take only half the time since the matrix is already initialised.

PETSc (and no doubt Trilinos) do offer some assembly possibilities that we haven't yet exploited because they require a reorganisation of the dof map.

Garth

I am pretty sure that we may speed up the A.set() and A.get() processes as
well by the above method.

Please explain.

I am not sure how the dofmap to get rows and cols indices of the cells is
implemented. We could avoid repeating this operation as well.

This is already implemented (but maybe not used). DofMap handles this.
It wraps the generated ufc::dof_map code and may pretabulate (and
possibly reorder) the dofs.

We did some comparison with another free fem toolbox, FemLego, the
assembly process in Dolfin is 3 times slower than FemLego in 2D. I believe
this number will increase in 3D. FemLego uses quadrature rule for
computing integrals.

Can you benchmark the various parts of the assembly to see what causes
the slowdown:

  1. Is it tabulate_tensor?
  2. Is it tabulate_dofs?
  3. Is it A.add()?
  4. Something else?

I hope some PETSc guys will help us to do this improvements. Any other
ideas are welcome!

We are currently experimenting with collecting and preprocessing
batches of entries before inserting into the global sparse matrix in
hope of speeding up the assembly but we don't have any results yet.



Follow ups

References