dolfin team mailing list archive

Thread
Date

Re: Parallel assembly

To: dolfin-dev@xxxxxxxxxx, Magnus Vikstrøm <gustavv@xxxxxxxxxx>
From: "Garth N. Wells" <gnw20@xxxxxxxxx>
Date: Fri, 07 Dec 2007 14:37:57 +0000
Delivered-to: dolfin-dev@xxxxxxxxxx
In-reply-to: <20071207141817.GE6584@simula.no>
User-agent: Thunderbird 2.0.0.6 (X11/20071022)



Anders Logg wrote:

On Fri, Dec 07, 2007 at 02:13:22PM +0000, Garth N. Wells wrote:


Anders Logg wrote:

On Fri, Dec 07, 2007 at 03:04:12PM +0100, Anders Logg wrote:

On Wed, Dec 05, 2007 at 06:18:53PM +0100, Johan Hoffman wrote:

Just some comments on the strategy.

On Dec 5, 2007 7:50 AM, Anders Logg <logg@xxxxxxxxx> wrote:

On Mon, Dec 03, 2007 at 11:44:44AM +0100, Niclas Jansson wrote:

It's a different strategy that uses point to point instead of

collective

communication. However the plan for parallel assembly should be more

or

less the same.

I attached the more detailed TODO list, it should explain the

necessary

changes to the Mesh classes.

Niclas
Modify mesh representation to store both local and global indices
for each cell/vertex. Implement mesh functions to map between local
and global indices. The local indices corresponds to the current
cell and vertex indices, only the mapping functions must be added to
the Mesh class.

I don't think we should store both local and global indices for mesh
entities. All we need is to store the mapping from local to global
indices. We can use MeshFunctions for this but it's not necessary.

My suggestion would be to add a new class MeshNumbering (maybe someone
can suggest a better name) which would store the numbering scheme in a
set of (dim + 1) arrays:

  class MeshNumbering
  {
  public:

    ...

  private:

    uint** numbering;

  }

One array for each dimension so, numbering[0][i] would return the
global number for the vertex with index i.

We can also add an easy-acccess function for global numbers to
MeshEntity so that (for example) e.number() would return the global
number of an entity e.

I think the motivation for locally storing the global index is to keep the
algorithms fully distributed.
As far as I understand, with a
MeshNumbering-solution as you outline you would store this data at one
processor? OR do I misunderstand this? For very large problems we want all
algorithms to be fully distributed.

I mean storing the indices locally on each processor (for its part of
the mesh) but storing them in global arrays on those processors. There
is no way to store a value locally at each vertex etc since those
objects (vertices, cells and in general MeshEntity) don't exist. Only
the arrays exist and the objects are views that get created by the
iterators etc.

I will just point out that I think this is very limiting. You can argue
that it
covers what you want to do, but it is quite inflexible compared with
having names. It is an incredible pain in the ass the rebalance ( or
anything else complicated, like AMR) if you
rely on offsets (numberings) rather than names. I recommend (as we
do) using names until you have exactly the mesh you want, and then
reducing to offsets. This is implemeted manually in Sieve right now
(you call a method), but I am trying to automate it with code generation.

I am not sure of what you mean by "names"? But I guess that you mean that
you want to work with different closures of a set of mesh entities (like
cells for example)? If that's what you mean, then I would agree.

Adapt mesh reading for the new representation, store mesh data based
on number of local cells/vertices instead of parsed numbers. This
modification allows processors to read different parts of the mesh
in parallel making an initial distribution step unnecessary.

Loading meshes in parallel should increase efficiency, reduce cost
full communication and save memory for large scale problem, given
that the parallel environment have a shared file system that could
handle the load. However the serial distribution should still be
implemented to support environment without shared file systems.

 Modification for the new representation should be implemented in
 class XMLMesh. Functions for initial mesh distribution should be
 implemented in a new class.

For this, we should add optional data to the mesh format, such that
the current file format still works. If additional data is present,
then that is read into MeshNumbering, otherwise it is empty.

Yes, that is natural (that the serial/regular mesh format should still
work without modifications). Although, I am not sure that I think the
MeshNumbering approach is a good solution.

(When I think of it, MeshNumbering may not be a good choice of name
for the new class, since it may be confused with MeshOrdering which
does something different but related.)

Change mesh partitioning library to ParMETIS. Modify the
partitioning class to work on distributed data, add the necessary
calls to METIS and redistribute the local vertices/cells according
to the result. Since METIS could partition a mesh directly using an
internal mesh to graph translation it is possible to have
partitioning directly in the MeshPartition class. However both
methods could easily be implemented and compared against each other.

We don't want to change from SCOTCH to ParMETIS, but we could add
support for using METIS/ParMETIS as an option.

Yes, that is right. The implementation for the basic algorithms should
look more or less the same for Scotch and parMetis, and then we could add
extra optional functionality based on parMetis until we can find
corresponding functionality in Scotch.

Have you thought about generalizing the partitioning to hypergraphs? I
just
did this so I can partition faces (for FVM) and it was not that bad. I
use Zoltan
from Sandia.

Sounds interesting. I do not know about this, but it may be worth to look at.

Finish implementation of mesh communication class MPIMeshCommunicator.
Add functionality for single vertex and cell communication needed
for mesh partitioning.

What do you mean by single vertex and cell communication? Also note
that it is not enough to communicate indices for vertices and
cells. Sometimes we also need to communicate edges and faces.

That is why you should never explicitly refer to vertices and cells, but
rather
communicate that entire closure and star of each element which you send.
That is the point of the mesh structure, to avoid this kind of special
purpose
coding.

Adapt boundary calculation to work on distributed meshes. Use
knowledge about which vertices are shared among processors to decide
if an edge is global or local. Implement the logic directly in
BoundaryComputation class using information from the mesh
partitioning.

I'm not sure I understand this point.

If you only have a part of the mesh, as is the case for a fully
distributed mesh, then you need to know if your Boundary (of the local
Mesh) is a internal boundary, which should communicate with its
neighboring partitions, or an external boundary for which you apply
boundary conditions.

ok, good point.

We also need to be able to assemble over interior facets (in DG
methods). There we iterate over all interior facets and for each facet
pick its two neighboring cells, so those two cells need to be
available. So if we want to partition along a given facet, do we
include the two cells on either side in both sub domains so that both
cells are available from both sub domains?

I would say yes. This is how FD methods typically work in parallel.

Garth


ok, good.

Magnus, can you add a flag to Mesh::partition() for doing this?

To start with, we can just have a boolean:

  void partition(uint num_partitions, MeshFunction<uint>& partitions, bool overlap=true);

The question is how to number those cells which are on two processors.
One option would be

  (i + 1)*num_partitions + j

if a cell should be on processors i and j.

In terms of the mesh, we won't need this now because each process hasthe entire mesh. It will be needed when we go fully distributed, but Ithink that this is the lowest priority in terms of parallelisation atthis stage.

We may need another MeshFunction which is tied to facets so each processknows whether not it should assemble a facet integral at a processboundary.

What we do need in the short term are the values of functions defined onthe 'ghost' cells so that DG terms can be computed at processorboundaries. This will require some careful re-designing of DiscreteFunction.


Garth

Follow ups

Re: Parallel assembly
From: Anders Logg, 2007-12-09

References

Re: Parallel assembly
From: Johan Hoffman, 2007-12-02
Re: Parallel assembly
From: Anders Logg, 2007-12-02
Re: Parallel assembly
From: Johan Hoffman, 2007-12-03
Re: Parallel assembly
From: Niclas Jansson, 2007-12-03
Re: Parallel assembly
From: Anders Logg, 2007-12-05
Re: Parallel assembly
From: Matthew Knepley, 2007-12-05
Re: Parallel assembly
From: Johan Hoffman, 2007-12-05
Re: Parallel assembly
From: Anders Logg, 2007-12-07
Re: Parallel assembly
From: Anders Logg, 2007-12-07
Re: Parallel assembly
From: Garth N. Wells, 2007-12-07
Re: Parallel assembly
From: Anders Logg, 2007-12-07