dolfin team mailing list archive
-
dolfin team
-
Mailing list archive
-
Message #07955
Re: profiling an assembly
...
>> > But we also need to remember that
>> >
>> > 1. Solve time may dominate assemble anyway so that's where we should
>> > optimize.
>> >
>> > 2. Assembling the action instead of the operator removes the A.add()
>> > bottleneck.
>> >
>> > As mentioned before, we are experimenting with iterating locally over
>> > cells sharing common dofs and combining batches of element tensors
>> > before inserting into the global sparse matrix row by row. Let's see
>> > how it goes.
>> >
>> > Some other comments:
>> >
>> > Murtazo: It's interesting that femLego is that much faster. It would
>> > be interesting to see exactly why it is faster. Can you take a simple
>> > example (maybe even Poisson) and profile both femLego and DOLFIN on
>> > the same mesh and get detailed results for tabulate_tensor,
>> > tabulate_dofs, A.add(). If femLego is faster on A.add(), then what
>> > linear algebra backend is it using?
>>
>> Yes, the test we did it is simple 2D Poisson with unite square mesh and
>> an
>> assembly in FemLego is 3 time faster, because A.add() is done in a way I
>> wrote in previous mails. The linear algebra package is AZTEC. Perhaps,
>> dolfin should be much faster than femLego if A.add() is the same, since
>> FFC is very fast than quadrature rule.
>
> I thought AZTEC was just solvers. I mean what sparse matrix format is
> used. And what does the interface look like for communicating the
> exact position for insertion?
The sparse matrix is just double* atw; Attached I send you the subroutine
which does this A.add(): idxatw(el,li,lj) is index of global matrix for
cell = ell, row = li, col = lj.
>
>> > Murtazo: It seems you suggest we should basically store some kind of
>> > index/pointer for where to write directly to memory when adding the
>> > entries. This has two problems: (i) we need to store quite a few of
>> > these indices (n^2*num_cells where n is the local space dimension),
>> > and (ii) it requires cooperation from the linear algebra backend.
>> > We would need to ask PETSc to tell us where in its private data
>> > structures it inserted the entries.
>>
>> Yes, maybe there is a better way to do it. If we store the global
>> indices
>> of the A it will be totally A.nz()*numVertices*num_components, but still
>> we will have a speedup which is more important in some case.
>
> That doesn't look correct to me. I think we would need n^2*num_cells.
> We iterate over the cells and for each cell we have n^2 entries and we
> need to know where to insert each one of those.
>
I have contributed and have experience with femLego (i did my exjob with
that). This way works well in parallel also. The problem may not be
problem for a very large mesh, since anyway one need to use parallel
processors.
murtazo
References