dolfin team mailing list archive

Thread
Date

Re: Parallel assembly

To: dolfin-dev@xxxxxxxxxx
From: Anders Logg <logg@xxxxxxxxx>
Date: Fri, 7 Dec 2007 15:18:17 +0100
Delivered-to: dolfin-dev@xxxxxxxxxx
In-reply-to: <47595502.4040207@cam.ac.uk>
Mail-followup-to: dolfin-dev@xxxxxxxxxx, Magnus Vikstrøm <gustavv@xxxxxxxxxx>
User-agent: Mutt/1.5.15+20070412 (2007-04-11)

On Fri, Dec 07, 2007 at 02:13:22PM +0000, Garth N. Wells wrote:
> 
> 
> Anders Logg wrote:
> > On Fri, Dec 07, 2007 at 03:04:12PM +0100, Anders Logg wrote:
> >> On Wed, Dec 05, 2007 at 06:18:53PM +0100, Johan Hoffman wrote:
> >>>> Just some comments on the strategy.
> >>>>
> >>>> On Dec 5, 2007 7:50 AM, Anders Logg <logg@xxxxxxxxx> wrote:
> >>>>> On Mon, Dec 03, 2007 at 11:44:44AM +0100, Niclas Jansson wrote:
> >>>>>
> >>>>>> It's a different strategy that uses point to point instead of
> >>>>> collective
> >>>>>> communication. However the plan for parallel assembly should be more
> >>>>> or
> >>>>>> less the same.
> >>>>>>
> >>>>>> I attached the more detailed TODO list, it should explain the
> >>>>> necessary
> >>>>>> changes to the Mesh classes.
> >>>>>>
> >>>>>> Niclas
> >>>>>> Modify mesh representation to store both local and global indices
> >>>>>> for each cell/vertex. Implement mesh functions to map between local
> >>>>>> and global indices. The local indices corresponds to the current
> >>>>>> cell and vertex indices, only the mapping functions must be added to
> >>>>>> the Mesh class.
> >>>>> I don't think we should store both local and global indices for mesh
> >>>>> entities. All we need is to store the mapping from local to global
> >>>>> indices. We can use MeshFunctions for this but it's not necessary.
> >>>>>
> >>>>> My suggestion would be to add a new class MeshNumbering (maybe someone
> >>>>> can suggest a better name) which would store the numbering scheme in a
> >>>>> set of (dim + 1) arrays:
> >>>>>
> >>>>>   class MeshNumbering
> >>>>>   {
> >>>>>   public:
> >>>>>
> >>>>>     ...
> >>>>>
> >>>>>   private:
> >>>>>
> >>>>>     uint** numbering;
> >>>>>
> >>>>>   }
> >>>>>
> >>>>> One array for each dimension so, numbering[0][i] would return the
> >>>>> global number for the vertex with index i.
> >>>>>
> >>>>> We can also add an easy-acccess function for global numbers to
> >>>>> MeshEntity so that (for example) e.number() would return the global
> >>>>> number of an entity e.
> >>> I think the motivation for locally storing the global index is to keep the
> >>> algorithms fully distributed.
> >>> As far as I understand, with a
> >>> MeshNumbering-solution as you outline you would store this data at one
> >>> processor? OR do I misunderstand this? For very large problems we want all
> >>> algorithms to be fully distributed.
> >> I mean storing the indices locally on each processor (for its part of
> >> the mesh) but storing them in global arrays on those processors. There
> >> is no way to store a value locally at each vertex etc since those
> >> objects (vertices, cells and in general MeshEntity) don't exist. Only
> >> the arrays exist and the objects are views that get created by the
> >> iterators etc.
> >>
> >>>> I will just point out that I think this is very limiting. You can argue
> >>>> that it
> >>>> covers what you want to do, but it is quite inflexible compared with
> >>>> having names. It is an incredible pain in the ass the rebalance ( or
> >>>> anything else complicated, like AMR) if you
> >>>> rely on offsets (numberings) rather than names. I recommend (as we
> >>>> do) using names until you have exactly the mesh you want, and then
> >>>> reducing to offsets. This is implemeted manually in Sieve right now
> >>>> (you call a method), but I am trying to automate it with code generation.
> >>> I am not sure of what you mean by "names"? But I guess that you mean that
> >>> you want to work with different closures of a set of mesh entities (like
> >>> cells for example)? If that's what you mean, then I would agree.
> >>>
> >>>>>> Adapt mesh reading for the new representation, store mesh data based
> >>>>>> on number of local cells/vertices instead of parsed numbers. This
> >>>>>> modification allows processors to read different parts of the mesh
> >>>>>> in parallel making an initial distribution step unnecessary.
> >>>>>>
> >>>>>> Loading meshes in parallel should increase efficiency, reduce cost
> >>>>>> full communication and save memory for large scale problem, given
> >>>>>> that the parallel environment have a shared file system that could
> >>>>>> handle the load. However the serial distribution should still be
> >>>>>> implemented to support environment without shared file systems.
> >>>>>>
> >>>>>>  Modification for the new representation should be implemented in
> >>>>>>  class XMLMesh. Functions for initial mesh distribution should be
> >>>>>>  implemented in a new class.
> >>>>> For this, we should add optional data to the mesh format, such that
> >>>>> the current file format still works. If additional data is present,
> >>>>> then that is read into MeshNumbering, otherwise it is empty.
> >>> Yes, that is natural (that the serial/regular mesh format should still
> >>> work without modifications). Although, I am not sure that I think the
> >>> MeshNumbering approach is a good solution.
> >>>
> >>>>> (When I think of it, MeshNumbering may not be a good choice of name
> >>>>> for the new class, since it may be confused with MeshOrdering which
> >>>>> does something different but related.)
> >>>>>
> >>>>>> Change mesh partitioning library to ParMETIS. Modify the
> >>>>>> partitioning class to work on distributed data, add the necessary
> >>>>>> calls to METIS and redistribute the local vertices/cells according
> >>>>>> to the result. Since METIS could partition a mesh directly using an
> >>>>>> internal mesh to graph translation it is possible to have
> >>>>>> partitioning directly in the MeshPartition class. However both
> >>>>>> methods could easily be implemented and compared against each other.
> >>>>> We don't want to change from SCOTCH to ParMETIS, but we could add
> >>>>> support for using METIS/ParMETIS as an option.
> >>> Yes, that is right. The implementation for the basic algorithms should
> >>> look more or less the same for Scotch and parMetis, and then we could add
> >>> extra optional functionality based on parMetis until we can find
> >>> corresponding functionality in Scotch.
> >>>
> >>>> Have you thought about generalizing the partitioning to hypergraphs? I
> >>>> just
> >>>> did this so I can partition faces (for FVM) and it was not that bad. I
> >>>> use Zoltan
> >>>> from Sandia.
> >>> Sounds interesting. I do not know about this, but it may be worth to look at.
> >>>
> >>>>>> Finish implementation of mesh communication class MPIMeshCommunicator.
> >>>>>> Add functionality for single vertex and cell communication needed
> >>>>>> for mesh partitioning.
> >>>>> What do you mean by single vertex and cell communication? Also note
> >>>>> that it is not enough to communicate indices for vertices and
> >>>>> cells. Sometimes we also need to communicate edges and faces.
> >>>> That is why you should never explicitly refer to vertices and cells, but
> >>>> rather
> >>>> communicate that entire closure and star of each element which you send.
> >>>> That is the point of the mesh structure, to avoid this kind of special
> >>>> purpose
> >>>> coding.
> >>>>
> >>>>>> Adapt boundary calculation to work on distributed meshes. Use
> >>>>>> knowledge about which vertices are shared among processors to decide
> >>>>>> if an edge is global or local. Implement the logic directly in
> >>>>>> BoundaryComputation class using information from the mesh
> >>>>>> partitioning.
> >>>>> I'm not sure I understand this point.
> >>> If you only have a part of the mesh, as is the case for a fully
> >>> distributed mesh, then you need to know if your Boundary (of the local
> >>> Mesh) is a internal boundary, which should communicate with its
> >>> neighboring partitions, or an external boundary for which you apply
> >>> boundary conditions.
> >> ok, good point.
> > 
> > We also need to be able to assemble over interior facets (in DG
> > methods). There we iterate over all interior facets and for each facet
> > pick its two neighboring cells, so those two cells need to be
> > available. So if we want to partition along a given facet, do we
> > include the two cells on either side in both sub domains so that both
> > cells are available from both sub domains?
> > 
> 
> I would say yes. This is how FD methods typically work in parallel.
> 
> Garth

ok, good.

Magnus, can you add a flag to Mesh::partition() for doing this?

To start with, we can just have a boolean:

  void partition(uint num_partitions, MeshFunction<uint>& partitions, bool overlap=true);

The question is how to number those cells which are on two processors.
One option would be

  (i + 1)*num_partitions + j

if a cell should be on processors i and j.

-- 
Anders

Follow ups

Re: Parallel assembly
From: Garth N. Wells, 2007-12-07

References

Re: Parallel assembly
From: Johan Hoffman, 2007-12-02
Re: Parallel assembly
From: Anders Logg, 2007-12-02
Re: Parallel assembly
From: Johan Hoffman, 2007-12-03
Re: Parallel assembly
From: Niclas Jansson, 2007-12-03
Re: Parallel assembly
From: Anders Logg, 2007-12-05
Re: Parallel assembly
From: Matthew Knepley, 2007-12-05
Re: Parallel assembly
From: Johan Hoffman, 2007-12-05
Re: Parallel assembly
From: Anders Logg, 2007-12-07
Re: Parallel assembly
From: Anders Logg, 2007-12-07
Re: Parallel assembly
From: Garth N. Wells, 2007-12-07