dolfin team mailing list archive

Thread
Date

Re: Parallel assembly

To: Niclas Jansson <njansson@xxxxxx>
From: "Garth N. Wells" <gnw20@xxxxxxxxx>
Date: Wed, 05 Dec 2007 18:18:31 +0000
Cc: dolfin-dev@xxxxxxxxxx
Delivered-to: dolfin-dev@xxxxxxxxxx
In-reply-to: <20071205183753.0081a8a1.njansson@kth.se>
User-agent: Thunderbird 2.0.0.6 (X11/20071022)



Niclas Jansson wrote:

On Wed, 5 Dec 2007 08:09:07 -0600
"Matthew Knepley" <knepley@xxxxxxxxx> wrote:

Just some comments on the strategy.

On Dec 5, 2007 7:50 AM, Anders Logg <logg@xxxxxxxxx> wrote:

On Mon, Dec 03, 2007 at 11:44:44AM +0100, Niclas Jansson wrote:

It's a different strategy that uses point to point instead of
collective communication. However the plan for parallel assembly
should be more or less the same.

I attached the more detailed TODO list, it should explain the
necessary changes to the Mesh classes.

Niclas
Modify mesh representation to store both local and global indices
for each cell/vertex. Implement mesh functions to map between
local and global indices. The local indices corresponds to the
current cell and vertex indices, only the mapping functions must
be added to the Mesh class.

I don't think we should store both local and global indices for mesh
entities. All we need is to store the mapping from local to global
indices. We can use MeshFunctions for this but it's not necessary.

My suggestion would be to add a new class MeshNumbering (maybe
someone can suggest a better name) which would store the numbering
scheme in a set of (dim + 1) arrays:

  class MeshNumbering
  {
  public:

    ...

  private:

    uint** numbering;

  }

One array for each dimension so, numbering[0][i] would return the
global number for the vertex with index i.

We can also add an easy-acccess function for global numbers to
MeshEntity so that (for example) e.number() would return the global
number of an entity e.

I will just point out that I think this is very limiting. You can
argue that it covers what you want to do, but it is quite inflexible
compared with having names. It is an incredible pain in the ass the
rebalance ( or anything else complicated, like AMR) if you
rely on offsets (numberings) rather than names. I recommend (as we
do) using names until you have exactly the mesh you want, and then
reducing to offsets. This is implemeted manually in Sieve right now
(you call a method), but I am trying to automate it with code
generation.


Ok, since the second part of the project covers AMR maybe a different
approach is needed.

Adapt mesh reading for the new representation, store mesh data
based on number of local cells/vertices instead of parsed numbers.
This modification allows processors to read different parts of the
mesh in parallel making an initial distribution step unnecessary.

Loading meshes in parallel should increase efficiency, reduce cost
full communication and save memory for large scale problem, given
that the parallel environment have a shared file system that could
handle the load. However the serial distribution should still be
implemented to support environment without shared file systems.

 Modification for the new representation should be implemented in
 class XMLMesh. Functions for initial mesh distribution should be
 implemented in a new class.

For this, we should add optional data to the mesh format, such that
the current file format still works. If additional data is present,
then that is read into MeshNumbering, otherwise it is empty.

(When I think of it, MeshNumbering may not be a good choice of name
for the new class, since it may be confused with MeshOrdering which
does something different but related.)



It would clean up the implementation a lot, my idea was to use a simple
linear distribution just to get everything of disk. But maybe this
approach (whole idea) wont scale beyond 4-8 processor without any fancy
filesystem (gpfs) or MPI-IO implementation.

Change mesh partitioning library to ParMETIS. Modify the
partitioning class to work on distributed data, add the necessary
calls to METIS and redistribute the local vertices/cells according
to the result. Since METIS could partition a mesh directly using
an internal mesh to graph translation it is possible to have
partitioning directly in the MeshPartition class. However both
methods could easily be implemented and compared against each
other.

We don't want to change from SCOTCH to ParMETIS, but we could add
support for using METIS/ParMETIS as an option.

Have you thought about generalizing the partitioning to hypergraphs? I
just did this so I can partition faces (for FVM) and it was not that
bad. I use Zoltan
from Sandia.


No, but Zoltan looked really interesting for the AMR/load balancing parts.

Finish implementation of mesh communication class
MPIMeshCommunicator. Add functionality for single vertex and cell
communication needed for mesh partitioning.

What do you mean by single vertex and cell communication? Also note
that it is not enough to communicate indices for vertices and
cells. Sometimes we also need to communicate edges and faces.

That is why you should never explicitly refer to vertices and cells,
but rather communicate that entire closure and star of each element
which you send. That is the point of the mesh structure, to avoid this
kind of special purpose coding.


Of course, what I meant was that functionality for a point-to-point
pattern had to be implemented.

Adapt boundary calculation to work on distributed meshes. Use
knowledge about which vertices are shared among processors to
decide if an edge is global or local. Implement the logic directly
in BoundaryComputation class using information from the mesh
partitioning.

I'm not sure I understand this point.


Since the mesh is distributed a boundary could be local (shared among
processor ) or global where the BC should be applied. The list of shared
vertices could be used to sort out the local boundaries.

But why do you need this? All you need is the dof map, and PETsc willtake care of assembling entries on the boundaries of partitions.

Modify Assembly process with a mapping function which maps
dof_maps indices from local global prior to updating the global
tensor. Implement the call in class Assembler using functions from
the Mesh class.

It might be enough to modify UFCCell::update().


Ok, I was thinking about something similar to the previously discussed
pdofmap approach (src/sandbox/passembly).

This is the point which is most pressing. Magnus has mesh partitioningand distribution working (which can be refined later to be fullydistributed), so to really get moving with parallel assembly we need tosort out the dof mapping. The second priority is then making sire thatthe Function class works properly in parallel.


Garth

Niclas

Change PETSc data types to MPI (PETScMatrix,PETScVector).
Change PETSc solver environment to use the correct MPI
communicator
 (All PETSc solver classes).

We need to determine whether to use MPI or Seq PETSc types depending
on whether we are running in parallel.

We have types for this like AIJ and the default Vec.

   Matt

--
Anders




_______________________________________________
DOLFIN-dev mailing list
DOLFIN-dev@xxxxxxxxxx
http://www.fenics.org/mailman/listinfo/dolfin-dev

Follow ups

Re: Parallel assembly
From: Niclas Jansson, 2007-12-06

References

Re: Parallel assembly
From: Anders Logg, 2007-12-01
Re: Parallel assembly
From: Johan Hoffman, 2007-12-02
Re: Parallel assembly
From: Garth N. Wells, 2007-12-02
Re: Parallel assembly
From: Johan Hoffman, 2007-12-02
Re: Parallel assembly
From: Garth N. Wells, 2007-12-02
Re: Parallel assembly
From: Johan Hoffman, 2007-12-02
Re: Parallel assembly
From: Anders Logg, 2007-12-02
Re: Parallel assembly
From: Johan Hoffman, 2007-12-03
Re: Parallel assembly
From: Niclas Jansson, 2007-12-03
Re: Parallel assembly
From: Anders Logg, 2007-12-05
Re: Parallel assembly
From: Matthew Knepley, 2007-12-05
Re: Parallel assembly
From: Niclas Jansson, 2007-12-05