← Back to team overview

dolfin team mailing list archive

Re: profiling an assembly

 

> It seems this thread got a bit derailed yesterday :(
>
> I've done some more careful profiling:
> *) Full assemble once
> *) Assemble matrix 30 times without reset
> in order to amortize the time for initialization.
>
> The call graph shows that std::lower_bound is called from add:
> 	dolfin::GenericMatrix::add ->
> 	dolfin::uBlasMatrix<>:add ->
> 	std::lower_bound
>
> In this assembly add+children takes 89% of the time, and tabulate_tensor
> taking roughly 9%. Full (gprof) profile attached. Murtazo is probably
> right that the performance numbers are virtually the same with PETSc. I
> will hook it up and try (and let you know if this is not the case).
>

Yes, I got the same numbers with PETSc. I checked and it it the same
problem with uBlas, I am pretty sure that searching the elements in the
assembly takes a very long time. Is it possible to change the element
matrix A(indx) directly in uBlas? If it is possible we may get that the
speedup would be much.

There is a MatSetOption in PETSc, MAT_USE_HASH_TABLE, to take care what
exactly I would like to have. But, that option does not work with AIJ
format we are using in dolfin.

murtazo

> I do appreciate the complexity of inserting elements into a sparse
> matrix, and I do _not_ claim to know better when it comes to the
> assembler architecture.
>
> Still, as I vary the size of the mesh I get this performance metric
> virtually constant:
> Assembled 7.3e+05 non-zero matrix elements per second (first pass)
> Assembled 1.4e+06 non-zero matrix elements per second (re-assemble).
>
> Is this a sensible metric? If so, is it well understood how the DOLFIN
> assembler performs? If not, I could put together a test-suite for a
> range of forms (2/3D, simple/mixed element, simple/complicated
> expressions in the form etc).
>
> To restate my question: How should I verify that the assembler is
> performing as expected here? Am I looking at some unexpected overhead in
> this assembly (we all know how hard this can be to spot with C++)?
>
> Thanks!
> /Dag
>
> Garth N. Wells wrote:
>>
>> Anders Logg wrote:
>>> On Fri, May 16, 2008 at 12:17:19AM +0200, Murtazo Nazarov wrote:
>>>>> Hello!
>>>>>
>>>>> I'm looking at a "suspiciously slow" assembly and would like to
>>>>> determine what is going on. In general, what should one expect the
>>>>> most
>>>>> time-consuming step to be?
>>>>>
>>>>> This is what my gprof looks like:
>>>>>
>>>>> Time:
>>>>> 61.97%  unsigned int const* std::lower_bound
>>>>> 25.84%  dolfin::uBlasMatrix<...>::add
>>>>> 8.27%
>>>>> UFC_NSEMomentum3DBilinearForm_cell_integral_0::tabulate_tensor
>>>>> 1.1%    dolfin::uBlasMatrix<...>::init
>>> Where is lower_bound used? From within uBlasMatrix::add or is it in
>>> building the sparsity pattern?
>>>
>>
>> I suspect that it's either in building the sparsity pattern or
>> initialising the uBLAS matrix. The matrix structure is initialised by
>> running across rows and inserting a zero. uBLAS doesn't provide a
>> mechanism for initialising the underlying data structures directly for a
>>   sparse matrix.
>>
>> Dag: could you you run the same test using PETSc as the backend?
>>
>>>> I got these numbers also. I understand that it is very painful in
>>>> large
>>>> computations.
>>>>
>>>> I see what is a problem with adding into the stiffness matrix A.
>>>> Searching
>>>> the position of the element which needs to be added takes very long
>>>> time,
>>>> especially if you are solving big problems with thousands unknowns and
>>>> repeating the assembling a lot of times!
>>> If you know a good way to avoid inserting entries into a sparse matrix
>>> during assembly, please tell me... :-)
>>>
>>> If the assembly is costly, you might want to try assembling the action
>>> of it instead and send that to a Krylov solver. Inserting into a
>>> vector is much easier than into a sparse matrix.
>>>
>>>> One way could be finding the global indices of the matrix A once, and
>>>> use
>>>> it in the assembly process. By this way we avoid of searching the
>>>> element
>>>> position and it makes the process significantly fast. But, there is a
>>>> problem: somehow I cannot get access to the global index of cell in
>>>> the A
>>>> and change it instead of using MatSetValues (in PETSc).
>>> I don't understand what you suggest here. We do precompute the
>>> sparsity pattern of the matrix and use that to preallocate, but I
>>> don't know of any other way to insert entries than MatSetValues.
>>>
>>
>> I doubt insertion is the real problem, especially as Dag noted that
>> subsequent assembly operations take only half the time since the matrix
>> is already initialised.
>>
>> PETSc (and no doubt Trilinos) do offer some assembly possibilities that
>> we haven't yet exploited because they require a reorganisation of the
>> dof map.
>>
>> Garth
>>
>>>> I am pretty sure that we may speed up the A.set() and A.get()
>>>> processes as
>>>> well by the above method.
>>> Please explain.
>>>
>>>> I am not sure how the dofmap to get rows and cols indices of the cells
>>>> is
>>>> implemented. We could avoid repeating this operation as well.
>>> This is already implemented (but maybe not used). DofMap handles this.
>>> It wraps the generated ufc::dof_map code and may pretabulate (and
>>> possibly reorder) the dofs.
>>>
>>>> We did some comparison with another free fem toolbox, FemLego, the
>>>> assembly process in Dolfin is 3 times slower than FemLego in 2D. I
>>>> believe
>>>> this number will increase in 3D. FemLego uses quadrature rule for
>>>> computing integrals.
>>> Can you benchmark the various parts of the assembly to see what causes
>>> the slowdown:
>>>
>>>   1. Is it tabulate_tensor?
>>>   2. Is it tabulate_dofs?
>>>   3. Is it A.add()?
>>>   4. Something else?
>>>
>>>> I hope some PETSc guys will help us to do this improvements. Any other
>>>> ideas are welcome!
>>> We are currently experimenting with collecting and preprocessing
>>> batches of entries before inserting into the global sparse matrix in
>>> hope of speeding up the assembly but we don't have any results yet.
>>>
>> _______________________________________________
>> DOLFIN-dev mailing list
>> DOLFIN-dev@xxxxxxxxxx
>> http://www.fenics.org/mailman/listinfo/dolfin-dev
> _______________________________________________
> DOLFIN-dev mailing list
> DOLFIN-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/dolfin-dev
>




Follow ups

References