← Back to team overview

dolfin team mailing list archive

Re: multi-thread assembly

 

On Thu, Nov 11, 2010 at 12:02:39PM +0000, Garth N. Wells wrote:
>
>
> On 11/11/10 09:53, Anders Logg wrote:
> >On Wed, Nov 10, 2010 at 08:43:22PM +0000, Garth N. Wells wrote:
> >>
> >>
> >>On 10/11/10 20:29, Anders Logg wrote:
> >>>On Wed, Nov 10, 2010 at 04:40:47PM +0000, Garth N. Wells wrote:
> >>>>
> >>>>
> >>>>On 10/11/10 16:10, Anders Logg wrote:
> >>>>>On Wed, Nov 10, 2010 at 03:58:13PM +0000, Garth N. Wells wrote:
> >>>>>>
> >>>>>>
> >>>>>>On 10/11/10 15:47, Anders Logg wrote:
> >>>>>>>On Wed, Nov 10, 2010 at 02:47:30PM +0000, Garth N. Wells wrote:
> >>>>>>>>Nice to see multi-thread assembly being added. We should look at
> >>>>>>>>adding support for the multi-threaded version of SuperLU. What other
> >>>>>>>>multi-thread solvers are out there?
> >>>>>>>
> >>>>>>>Yes, that would be good, but I don't know which solvers are available.
> >>>>>>>
> >>>>>>>>I haven't looked at the code in great detail, but are element
> >>>>>>>>tensors being added to the global tensor is a thread-safe fashion?
> >>>>>>>>Both PETSc and Trilinos are not thread-safe.
> >>>>>>>
> >>>>>>>Yes, they should. That's the main point. It's a very simple algorithm
> >>>>>>>which just partitions the matrix row by row and makes each process
> >>>>>>>responsible for a chunk of rows.
> >>>>>>
> >>>>>>Would it be better to partition the mesh (using Metis) and then
> >>>>>>renumber dofs? That way the 'all_in_range' case would be maximised
> >>>>>>and the 'some_in_range' would be minimised. If the rows are
> >>>>>>distributed, mixed elements will be a mess because the global rows
> >>>>>>are far apart (using the FFC-generated dof map).
> >>>>>
> >>>>>Renumbering is definitely important for getting good speedup. This
> >>>>>hasn't been added yet but was implemented in the prototype version
> >>>>>(which was stand-alone from DOLFIN).
> >>>>>
> >>>>
> >>>>What about partitioning of the mesh?
> >>>
> >>>I'm not sure that would help. If each process has access to the whole
> >>>mesh, which is either one big connected mesh or a connected piece of
> >>>the mesh when running in parallel with MPI, then renumbering on that
> >>>piece should be enough. Or am I missing something?
> >>>
> >>
> >>I think that you're missing something - partitioning (not in memory,
> >>but just assigning a partition number) would minimise the number of
> >>cells on partition boundaries cells, thereby maximising the number
> >>of cells for which 'all_in_range = true'. We could mark cells that
> >>are 'internal' to a partition (hence 'all_in_range = true') and
> >>cells on the boundary (hence 'some_in_range = true'), e.g.
> >>
> >>   // Partition cells with cell id (negated for cells on partition
> >>   // boundary)
> >>   MeshFunction<int>  partition;
> >>
> >>   if (partition(cell) == thread_id)
> >>       compute tensor and assemble all
> >>   else if (std::abs(partition(cell)) == thread_id)
> >>       compute tensor and assemble some terms
> >>   else
> >>       do nothing
> >>
> >>What you've described could go wrong certain cell/dof numberings.
> >>What I describe above wouldn't depend on the numbering.
> >
> >Yes, that would work, but I imagine a good renumbering algorithm would
> >accomplish the same thing.
> >
>
> Q. How to determine the optimal re-numbering?
>
> A. Partition the mesh to minimise 'interface' length between
> partitions, and number sequentially on each partition.
>
> :)

Yes, that sounds like a good plan. I will try it. My only worry is if
the partitioning takes a long time but we'll see.

> >Anyway, it's worth trying and comparing to just doing the renumbering.
> >
>
> How do you propose to renumber?

Just sequentially, which only works well if the cells have a
reasonable numbering to start with. They did in our experiments (since
we used UnitCube), and they will if we do partitioning before
renumbering.

> Renumbering is a *lot* simpler now that the UFC dofmap is copied
> into data structures in DOLFIN, and the UFC tabulate dofs function
> is only called at initialisation.

Great. Is there an overhead compare to calling the UFC generated
tabulate_dofs in the case when renumbering is not needed?

--
Anders



Follow ups

References