← Back to team overview

dolfin team mailing list archive

Re: profiling an assembly

 

On Sun, May 18, 2008 at 10:11:31PM +0200, Murtazo Nazarov wrote:
> Sorry, I forgot to put an attachment:
> 
> 
> ...
> >> > But we also need to remember that
> >> >
> >> > 1. Solve time may dominate assemble anyway so that's where we should
> optimize.
> >> >
> >> > 2. Assembling the action instead of the operator removes the A.add()
> bottleneck.
> >> >
> >> > As mentioned before, we are experimenting with iterating locally over
> cells sharing common dofs and combining batches of element tensors
> before inserting into the global sparse matrix row by row. Let's see
> how it goes.
> >> >
> >> > Some other comments:
> >> >
> >> > Murtazo: It's interesting that femLego is that much faster. It would
> be interesting to see exactly why it is faster. Can you take a simple
> example (maybe even Poisson) and profile both femLego and DOLFIN on
> the same mesh and get detailed results for tabulate_tensor,
> >> > tabulate_dofs, A.add(). If femLego is faster on A.add(), then what
> linear algebra backend is it using?
> >>
> >> Yes, the test we did it is simple 2D Poisson with unite square mesh and an
> >> assembly in FemLego is 3 time faster, because A.add() is done in a way
> I wrote in previous mails. The linear algebra package is AZTEC.
> Perhaps, dolfin should be much faster than femLego if A.add() is the
> same, since FFC is very fast than quadrature rule.
> >
> > I thought AZTEC was just solvers. I mean what sparse matrix format is
> used. And what does the interface look like for communicating the exact
> position for insertion?
> 
> The sparse matrix is just double* atw; Attached I send you the subroutine 
> which does this A.add(): idxatw(el,li,lj) is index of global matrix for
> cell = ell, row = li, col = lj.
> 
> 
> >
> >> > Murtazo: It seems you suggest we should basically store some kind of
> index/pointer for where to write directly to memory when adding the
> entries. This has two problems: (i) we need to store quite a few of
> these indices (n^2*num_cells where n is the local space dimension),
> and (ii) it requires cooperation from the linear algebra backend. We
> would need to ask PETSc to tell us where in its private data
> structures it inserted the entries.
> >>
> >> Yes, maybe there is a better way to do it. If we store the global indices
> >> of the A it will be totally A.nz()*numVertices*num_components, but
> still we will have a speedup which is more important in some case.
> >
> > That doesn't look correct to me. I think we would need n^2*num_cells. We
> iterate over the cells and for each cell we have n^2 entries and we need
> to know where to insert each one of those.
> >
> 
> I have contributed and have experience with femLego (i did my exjob with
> that). This way works well in parallel also. The problem may not be
> problem for a very large mesh, since anyway one need to use parallel
> processors.
> 
> murtazo

It looks to me like the storage needed is indeed n^2*num_cells. I'm
not fluent in Fortran, but that's how I interpret this line:

  atw(idxatw(el,li,lj)) = atw(idxatw(el,li,lj)) + Atmp(li,lj)

This looks expensive (in terms of memory), but maybe not that
expensive?

-- 
Anders


Follow ups

References