← Back to team overview

dolfin team mailing list archive

Re: profiling an assembly

 

I have not gotten through the whole thread yet. However, two points:

  1) Refering to Johan's question about timing: One thing I see in this code
      is a call for every element. In my example, I call MatSetValues() once for
      every element matrix. This speeds up operation considerably.

  2) If this is a big concern, we should setup a simple example that everyone
      can run and post timing for. I recommend straight C, but would accept C++.
      It should read in the connectivity table. I actually have a 3D hex example
      like this to test PETSc preallocation. I will see about
constructing an insertion
      test.

   Matt

On Sun, May 18, 2008 at 3:11 PM, Murtazo Nazarov <murtazo@xxxxxxxxxxx> wrote:
> Sorry, I forgot to put an attachment:
>
>
> ...
>>> > But we also need to remember that
>>> >
>>> > 1. Solve time may dominate assemble anyway so that's where we should
> optimize.
>>> >
>>> > 2. Assembling the action instead of the operator removes the A.add()
> bottleneck.
>>> >
>>> > As mentioned before, we are experimenting with iterating locally over
> cells sharing common dofs and combining batches of element tensors
> before inserting into the global sparse matrix row by row. Let's see
> how it goes.
>>> >
>>> > Some other comments:
>>> >
>>> > Murtazo: It's interesting that femLego is that much faster. It would
> be interesting to see exactly why it is faster. Can you take a simple
> example (maybe even Poisson) and profile both femLego and DOLFIN on
> the same mesh and get detailed results for tabulate_tensor,
>>> > tabulate_dofs, A.add(). If femLego is faster on A.add(), then what
> linear algebra backend is it using?
>>>
>>> Yes, the test we did it is simple 2D Poisson with unite square mesh and an
>>> assembly in FemLego is 3 time faster, because A.add() is done in a way
> I wrote in previous mails. The linear algebra package is AZTEC.
> Perhaps, dolfin should be much faster than femLego if A.add() is the
> same, since FFC is very fast than quadrature rule.
>>
>> I thought AZTEC was just solvers. I mean what sparse matrix format is
> used. And what does the interface look like for communicating the exact
> position for insertion?
>
> The sparse matrix is just double* atw; Attached I send you the subroutine
> which does this A.add(): idxatw(el,li,lj) is index of global matrix for
> cell = ell, row = li, col = lj.
>
>
>>
>>> > Murtazo: It seems you suggest we should basically store some kind of
> index/pointer for where to write directly to memory when adding the
> entries. This has two problems: (i) we need to store quite a few of
> these indices (n^2*num_cells where n is the local space dimension),
> and (ii) it requires cooperation from the linear algebra backend. We
> would need to ask PETSc to tell us where in its private data
> structures it inserted the entries.
>>>
>>> Yes, maybe there is a better way to do it. If we store the global indices
>>> of the A it will be totally A.nz()*numVertices*num_components, but
> still we will have a speedup which is more important in some case.
>>
>> That doesn't look correct to me. I think we would need n^2*num_cells. We
> iterate over the cells and for each cell we have n^2 entries and we need
> to know where to insert each one of those.
>>
>
> I have contributed and have experience with femLego (i did my exjob with
> that). This way works well in parallel also. The problem may not be
> problem for a very large mesh, since anyway one need to use parallel
> processors.
>
> murtazo
>
>
> _______________________________________________
> DOLFIN-dev mailing list
> DOLFIN-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/dolfin-dev
>
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener


References