dolfin team mailing list archive

Thread
Date

Re: multi-thread assembly

To: "Garth N. Wells" <gnw20@xxxxxxxxx>
From: Anders Logg <logg@xxxxxxxxx>
Date: Thu, 11 Nov 2010 10:53:04 +0100
Cc: DOLFIN Mailing List <dolfin@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <4CDB03EA.8010306@cam.ac.uk>
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Nov 10, 2010 at 08:43:22PM +0000, Garth N. Wells wrote:
>
>
> On 10/11/10 20:29, Anders Logg wrote:
> >On Wed, Nov 10, 2010 at 04:40:47PM +0000, Garth N. Wells wrote:
> >>
> >>
> >>On 10/11/10 16:10, Anders Logg wrote:
> >>>On Wed, Nov 10, 2010 at 03:58:13PM +0000, Garth N. Wells wrote:
> >>>>
> >>>>
> >>>>On 10/11/10 15:47, Anders Logg wrote:
> >>>>>On Wed, Nov 10, 2010 at 02:47:30PM +0000, Garth N. Wells wrote:
> >>>>>>Nice to see multi-thread assembly being added. We should look at
> >>>>>>adding support for the multi-threaded version of SuperLU. What other
> >>>>>>multi-thread solvers are out there?
> >>>>>
> >>>>>Yes, that would be good, but I don't know which solvers are available.
> >>>>>
> >>>>>>I haven't looked at the code in great detail, but are element
> >>>>>>tensors being added to the global tensor is a thread-safe fashion?
> >>>>>>Both PETSc and Trilinos are not thread-safe.
> >>>>>
> >>>>>Yes, they should. That's the main point. It's a very simple algorithm
> >>>>>which just partitions the matrix row by row and makes each process
> >>>>>responsible for a chunk of rows.
> >>>>
> >>>>Would it be better to partition the mesh (using Metis) and then
> >>>>renumber dofs? That way the 'all_in_range' case would be maximised
> >>>>and the 'some_in_range' would be minimised. If the rows are
> >>>>distributed, mixed elements will be a mess because the global rows
> >>>>are far apart (using the FFC-generated dof map).
> >>>
> >>>Renumbering is definitely important for getting good speedup. This
> >>>hasn't been added yet but was implemented in the prototype version
> >>>(which was stand-alone from DOLFIN).
> >>>
> >>
> >>What about partitioning of the mesh?
> >
> >I'm not sure that would help. If each process has access to the whole
> >mesh, which is either one big connected mesh or a connected piece of
> >the mesh when running in parallel with MPI, then renumbering on that
> >piece should be enough. Or am I missing something?
> >
>
> I think that you're missing something - partitioning (not in memory,
> but just assigning a partition number) would minimise the number of
> cells on partition boundaries cells, thereby maximising the number
> of cells for which 'all_in_range = true'. We could mark cells that
> are 'internal' to a partition (hence 'all_in_range = true') and
> cells on the boundary (hence 'some_in_range = true'), e.g.
>
>   // Partition cells with cell id (negated for cells on partition
>   // boundary)
>   MeshFunction<int> partition;
>
>   if (partition(cell) == thread_id)
>       compute tensor and assemble all
>   else if (std::abs(partition(cell)) == thread_id)
>       compute tensor and assemble some terms
>   else
>       do nothing
>
> What you've described could go wrong certain cell/dof numberings.
> What I describe above wouldn't depend on the numbering.

Yes, that would work, but I imagine a good renumbering algorithm would
accomplish the same thing.

Anyway, it's worth trying and comparing to just doing the renumbering.

--
Anders

Follow ups

Re: multi-thread assembly
From: Garth N. Wells, 2010-11-11

References

multi-thread assembly
From: Garth N. Wells, 2010-11-10
Re: multi-thread assembly
From: Anders Logg, 2010-11-10
Re: multi-thread assembly
From: Garth N. Wells, 2010-11-10
Re: multi-thread assembly
From: Anders Logg, 2010-11-10
Re: multi-thread assembly
From: Garth N. Wells, 2010-11-10
Re: multi-thread assembly
From: Anders Logg, 2010-11-10
Re: multi-thread assembly
From: Garth N. Wells, 2010-11-10