← Back to team overview

dolfin team mailing list archive

Re: profiling an assembly

 

Sorry, I forgot to put an attachment:


...
>> > But we also need to remember that
>> >
>> > 1. Solve time may dominate assemble anyway so that's where we should
optimize.
>> >
>> > 2. Assembling the action instead of the operator removes the A.add()
bottleneck.
>> >
>> > As mentioned before, we are experimenting with iterating locally over
cells sharing common dofs and combining batches of element tensors
before inserting into the global sparse matrix row by row. Let's see
how it goes.
>> >
>> > Some other comments:
>> >
>> > Murtazo: It's interesting that femLego is that much faster. It would
be interesting to see exactly why it is faster. Can you take a simple
example (maybe even Poisson) and profile both femLego and DOLFIN on
the same mesh and get detailed results for tabulate_tensor,
>> > tabulate_dofs, A.add(). If femLego is faster on A.add(), then what
linear algebra backend is it using?
>>
>> Yes, the test we did it is simple 2D Poisson with unite square mesh and an
>> assembly in FemLego is 3 time faster, because A.add() is done in a way
I wrote in previous mails. The linear algebra package is AZTEC.
Perhaps, dolfin should be much faster than femLego if A.add() is the
same, since FFC is very fast than quadrature rule.
>
> I thought AZTEC was just solvers. I mean what sparse matrix format is
used. And what does the interface look like for communicating the exact
position for insertion?

The sparse matrix is just double* atw; Attached I send you the subroutine 
which does this A.add(): idxatw(el,li,lj) is index of global matrix for
cell = ell, row = li, col = lj.


>
>> > Murtazo: It seems you suggest we should basically store some kind of
index/pointer for where to write directly to memory when adding the
entries. This has two problems: (i) we need to store quite a few of
these indices (n^2*num_cells where n is the local space dimension),
and (ii) it requires cooperation from the linear algebra backend. We
would need to ask PETSc to tell us where in its private data
structures it inserted the entries.
>>
>> Yes, maybe there is a better way to do it. If we store the global indices
>> of the A it will be totally A.nz()*numVertices*num_components, but
still we will have a speedup which is more important in some case.
>
> That doesn't look correct to me. I think we would need n^2*num_cells. We
iterate over the cells and for each cell we have n^2 entries and we need
to know where to insert each one of those.
>

I have contributed and have experience with femLego (i did my exjob with
that). This way works well in parallel also. The problem may not be
problem for a very large mesh, since anyway one need to use parallel
processors.

murtazo

Attachment: addmel.f
Description: Binary data


Follow ups