← Back to team overview

dolfin team mailing list archive

Re: Parallel assembly

 

On Sun, Dec 09, 2007 at 02:00:36PM -0600, Matthew Knepley wrote:
> On Dec 9, 2007 1:54 PM, Anders Logg <logg@xxxxxxxxx> wrote:
> > On Fri, Dec 07, 2007 at 02:37:57PM +0000, Garth N. Wells wrote:
> > >
> > >
> > > Anders Logg wrote:
> > > > On Fri, Dec 07, 2007 at 02:13:22PM +0000, Garth N. Wells wrote:
> > > >>
> > > >> Anders Logg wrote:
> > > >>> On Fri, Dec 07, 2007 at 03:04:12PM +0100, Anders Logg wrote:
> > > >>>> On Wed, Dec 05, 2007 at 06:18:53PM +0100, Johan Hoffman wrote:
> > > >>>>>> Just some comments on the strategy.
> > > >>>>>>
> > > >>>>>> On Dec 5, 2007 7:50 AM, Anders Logg <logg@xxxxxxxxx> wrote:
> > > >>>>>>> On Mon, Dec 03, 2007 at 11:44:44AM +0100, Niclas Jansson wrote:
> > > >>>>>>>
> > > >>>>>>>> It's a different strategy that uses point to point instead of
> > > >>>>>>> collective
> > > >>>>>>>> communication. However the plan for parallel assembly should be more
> > > >>>>>>> or
> > > >>>>>>>> less the same.
> > > >>>>>>>>
> > > >>>>>>>> I attached the more detailed TODO list, it should explain the
> > > >>>>>>> necessary
> > > >>>>>>>> changes to the Mesh classes.
> > > >>>>>>>>
> > > >>>>>>>> Niclas
> > > >>>>>>>> Modify mesh representation to store both local and global indices
> > > >>>>>>>> for each cell/vertex. Implement mesh functions to map between local
> > > >>>>>>>> and global indices. The local indices corresponds to the current
> > > >>>>>>>> cell and vertex indices, only the mapping functions must be added to
> > > >>>>>>>> the Mesh class.
> > > >>>>>>> I don't think we should store both local and global indices for mesh
> > > >>>>>>> entities. All we need is to store the mapping from local to global
> > > >>>>>>> indices. We can use MeshFunctions for this but it's not necessary.
> > > >>>>>>>
> > > >>>>>>> My suggestion would be to add a new class MeshNumbering (maybe someone
> > > >>>>>>> can suggest a better name) which would store the numbering scheme in a
> > > >>>>>>> set of (dim + 1) arrays:
> > > >>>>>>>
> > > >>>>>>>   class MeshNumbering
> > > >>>>>>>   {
> > > >>>>>>>   public:
> > > >>>>>>>
> > > >>>>>>>     ...
> > > >>>>>>>
> > > >>>>>>>   private:
> > > >>>>>>>
> > > >>>>>>>     uint** numbering;
> > > >>>>>>>
> > > >>>>>>>   }
> > > >>>>>>>
> > > >>>>>>> One array for each dimension so, numbering[0][i] would return the
> > > >>>>>>> global number for the vertex with index i.
> > > >>>>>>>
> > > >>>>>>> We can also add an easy-acccess function for global numbers to
> > > >>>>>>> MeshEntity so that (for example) e.number() would return the global
> > > >>>>>>> number of an entity e.
> > > >>>>> I think the motivation for locally storing the global index is to keep the
> > > >>>>> algorithms fully distributed.
> > > >>>>> As far as I understand, with a
> > > >>>>> MeshNumbering-solution as you outline you would store this data at one
> > > >>>>> processor? OR do I misunderstand this? For very large problems we want all
> > > >>>>> algorithms to be fully distributed.
> > > >>>> I mean storing the indices locally on each processor (for its part of
> > > >>>> the mesh) but storing them in global arrays on those processors. There
> > > >>>> is no way to store a value locally at each vertex etc since those
> > > >>>> objects (vertices, cells and in general MeshEntity) don't exist. Only
> > > >>>> the arrays exist and the objects are views that get created by the
> > > >>>> iterators etc.
> > > >>>>
> > > >>>>>> I will just point out that I think this is very limiting. You can argue
> > > >>>>>> that it
> > > >>>>>> covers what you want to do, but it is quite inflexible compared with
> > > >>>>>> having names. It is an incredible pain in the ass the rebalance ( or
> > > >>>>>> anything else complicated, like AMR) if you
> > > >>>>>> rely on offsets (numberings) rather than names. I recommend (as we
> > > >>>>>> do) using names until you have exactly the mesh you want, and then
> > > >>>>>> reducing to offsets. This is implemeted manually in Sieve right now
> > > >>>>>> (you call a method), but I am trying to automate it with code generation.
> > > >>>>> I am not sure of what you mean by "names"? But I guess that you mean that
> > > >>>>> you want to work with different closures of a set of mesh entities (like
> > > >>>>> cells for example)? If that's what you mean, then I would agree.
> > > >>>>>
> > > >>>>>>>> Adapt mesh reading for the new representation, store mesh data based
> > > >>>>>>>> on number of local cells/vertices instead of parsed numbers. This
> > > >>>>>>>> modification allows processors to read different parts of the mesh
> > > >>>>>>>> in parallel making an initial distribution step unnecessary.
> > > >>>>>>>>
> > > >>>>>>>> Loading meshes in parallel should increase efficiency, reduce cost
> > > >>>>>>>> full communication and save memory for large scale problem, given
> > > >>>>>>>> that the parallel environment have a shared file system that could
> > > >>>>>>>> handle the load. However the serial distribution should still be
> > > >>>>>>>> implemented to support environment without shared file systems.
> > > >>>>>>>>
> > > >>>>>>>>  Modification for the new representation should be implemented in
> > > >>>>>>>>  class XMLMesh. Functions for initial mesh distribution should be
> > > >>>>>>>>  implemented in a new class.
> > > >>>>>>> For this, we should add optional data to the mesh format, such that
> > > >>>>>>> the current file format still works. If additional data is present,
> > > >>>>>>> then that is read into MeshNumbering, otherwise it is empty.
> > > >>>>> Yes, that is natural (that the serial/regular mesh format should still
> > > >>>>> work without modifications). Although, I am not sure that I think the
> > > >>>>> MeshNumbering approach is a good solution.
> > > >>>>>
> > > >>>>>>> (When I think of it, MeshNumbering may not be a good choice of name
> > > >>>>>>> for the new class, since it may be confused with MeshOrdering which
> > > >>>>>>> does something different but related.)
> > > >>>>>>>
> > > >>>>>>>> Change mesh partitioning library to ParMETIS. Modify the
> > > >>>>>>>> partitioning class to work on distributed data, add the necessary
> > > >>>>>>>> calls to METIS and redistribute the local vertices/cells according
> > > >>>>>>>> to the result. Since METIS could partition a mesh directly using an
> > > >>>>>>>> internal mesh to graph translation it is possible to have
> > > >>>>>>>> partitioning directly in the MeshPartition class. However both
> > > >>>>>>>> methods could easily be implemented and compared against each other.
> > > >>>>>>> We don't want to change from SCOTCH to ParMETIS, but we could add
> > > >>>>>>> support for using METIS/ParMETIS as an option.
> > > >>>>> Yes, that is right. The implementation for the basic algorithms should
> > > >>>>> look more or less the same for Scotch and parMetis, and then we could add
> > > >>>>> extra optional functionality based on parMetis until we can find
> > > >>>>> corresponding functionality in Scotch.
> > > >>>>>
> > > >>>>>> Have you thought about generalizing the partitioning to hypergraphs? I
> > > >>>>>> just
> > > >>>>>> did this so I can partition faces (for FVM) and it was not that bad. I
> > > >>>>>> use Zoltan
> > > >>>>>> from Sandia.
> > > >>>>> Sounds interesting. I do not know about this, but it may be worth to look at.
> > > >>>>>
> > > >>>>>>>> Finish implementation of mesh communication class MPIMeshCommunicator.
> > > >>>>>>>> Add functionality for single vertex and cell communication needed
> > > >>>>>>>> for mesh partitioning.
> > > >>>>>>> What do you mean by single vertex and cell communication? Also note
> > > >>>>>>> that it is not enough to communicate indices for vertices and
> > > >>>>>>> cells. Sometimes we also need to communicate edges and faces.
> > > >>>>>> That is why you should never explicitly refer to vertices and cells, but
> > > >>>>>> rather
> > > >>>>>> communicate that entire closure and star of each element which you send.
> > > >>>>>> That is the point of the mesh structure, to avoid this kind of special
> > > >>>>>> purpose
> > > >>>>>> coding.
> > > >>>>>>
> > > >>>>>>>> Adapt boundary calculation to work on distributed meshes. Use
> > > >>>>>>>> knowledge about which vertices are shared among processors to decide
> > > >>>>>>>> if an edge is global or local. Implement the logic directly in
> > > >>>>>>>> BoundaryComputation class using information from the mesh
> > > >>>>>>>> partitioning.
> > > >>>>>>> I'm not sure I understand this point.
> > > >>>>> If you only have a part of the mesh, as is the case for a fully
> > > >>>>> distributed mesh, then you need to know if your Boundary (of the local
> > > >>>>> Mesh) is a internal boundary, which should communicate with its
> > > >>>>> neighboring partitions, or an external boundary for which you apply
> > > >>>>> boundary conditions.
> > > >>>> ok, good point.
> > > >>> We also need to be able to assemble over interior facets (in DG
> > > >>> methods). There we iterate over all interior facets and for each facet
> > > >>> pick its two neighboring cells, so those two cells need to be
> > > >>> available. So if we want to partition along a given facet, do we
> > > >>> include the two cells on either side in both sub domains so that both
> > > >>> cells are available from both sub domains?
> > > >>>
> > > >> I would say yes. This is how FD methods typically work in parallel.
> > > >>
> > > >> Garth
> > > >
> > > > ok, good.
> > > >
> > > > Magnus, can you add a flag to Mesh::partition() for doing this?
> > > >
> > > > To start with, we can just have a boolean:
> > > >
> > > >   void partition(uint num_partitions, MeshFunction<uint>& partitions, bool overlap=true);
> > > >
> > > > The question is how to number those cells which are on two processors.
> > > > One option would be
> > > >
> > > >   (i + 1)*num_partitions + j
> > > >
> > > > if a cell should be on processors i and j.
> > > >
> > >
> > > In terms of the mesh, we won't need this now because each process has
> > > the entire mesh. It will be needed when we go fully distributed, but I
> > > think that this is the lowest priority in terms of parallelisation at
> > > this stage.
> > >
> > > We may need another MeshFunction which is tied to facets so each process
> > > knows whether not it should assemble a facet integral at a process
> > > boundary.
> > >
> > > What we do need in the short term are the values of functions defined on
> > > the 'ghost' cells so that DG terms can be computed at processor
> > > boundaries. This will require some careful re-designing of DiscreteFunction.
> >
> > Yes. I had been hoping PETSc would handle this for us but I see now
> > that VecGetValues can only fetch values on the current processor.
> 
>  We use VecScatter for this. You can ask for any index and we will figure out
> the communication and make persistent MPI contexts for it. Its pretty easy.
> 
>   Matt

Great, thanks.

My suggestion (for the DOLFIN wrappers) would be to put a check in
Vector::get() to see if the asked for values recide on the current
process. If not, we use VecScatter (in the PETSc backend) to get them.

For efficiency, we should make sure that we get all those values at
once before the assembly begins so we don't call VecScatterBegin/End
on each local cell.

-- 
Anders


References