← Back to team overview

dolfin team mailing list archive

Re: multi-thread assembly

 



On 11/11/10 09:53, Anders Logg wrote:
On Wed, Nov 10, 2010 at 08:43:22PM +0000, Garth N. Wells wrote:


On 10/11/10 20:29, Anders Logg wrote:
On Wed, Nov 10, 2010 at 04:40:47PM +0000, Garth N. Wells wrote:


On 10/11/10 16:10, Anders Logg wrote:
On Wed, Nov 10, 2010 at 03:58:13PM +0000, Garth N. Wells wrote:


On 10/11/10 15:47, Anders Logg wrote:
On Wed, Nov 10, 2010 at 02:47:30PM +0000, Garth N. Wells wrote:
Nice to see multi-thread assembly being added. We should look at
adding support for the multi-threaded version of SuperLU. What other
multi-thread solvers are out there?

Yes, that would be good, but I don't know which solvers are available.

I haven't looked at the code in great detail, but are element
tensors being added to the global tensor is a thread-safe fashion?
Both PETSc and Trilinos are not thread-safe.

Yes, they should. That's the main point. It's a very simple algorithm
which just partitions the matrix row by row and makes each process
responsible for a chunk of rows.

Would it be better to partition the mesh (using Metis) and then
renumber dofs? That way the 'all_in_range' case would be maximised
and the 'some_in_range' would be minimised. If the rows are
distributed, mixed elements will be a mess because the global rows
are far apart (using the FFC-generated dof map).

Renumbering is definitely important for getting good speedup. This
hasn't been added yet but was implemented in the prototype version
(which was stand-alone from DOLFIN).


What about partitioning of the mesh?

I'm not sure that would help. If each process has access to the whole
mesh, which is either one big connected mesh or a connected piece of
the mesh when running in parallel with MPI, then renumbering on that
piece should be enough. Or am I missing something?


I think that you're missing something - partitioning (not in memory,
but just assigning a partition number) would minimise the number of
cells on partition boundaries cells, thereby maximising the number
of cells for which 'all_in_range = true'. We could mark cells that
are 'internal' to a partition (hence 'all_in_range = true') and
cells on the boundary (hence 'some_in_range = true'), e.g.

   // Partition cells with cell id (negated for cells on partition
   // boundary)
   MeshFunction<int>  partition;

   if (partition(cell) == thread_id)
       compute tensor and assemble all
   else if (std::abs(partition(cell)) == thread_id)
       compute tensor and assemble some terms
   else
       do nothing

What you've described could go wrong certain cell/dof numberings.
What I describe above wouldn't depend on the numbering.

Yes, that would work, but I imagine a good renumbering algorithm would
accomplish the same thing.


Q. How to determine the optimal re-numbering?

A. Partition the mesh to minimise 'interface' length between partitions, and number sequentially on each partition.

:)

Anyway, it's worth trying and comparing to just doing the renumbering.


How do you propose to renumber?

Renumbering is a *lot* simpler now that the UFC dofmap is copied into data structures in DOLFIN, and the UFC tabulate dofs function is only called at initialisation.

Garth


--
Anders




Follow ups

References