dolfin team mailing list archive

Thread
Date

Re: [Branch ~dolfin-core/dolfin/main] Rev 4896: Add simple Stokes solver for parallel testing.

To: "Garth N. Wells" <gnw20@xxxxxxxxx>
From: Anders Logg <logg@xxxxxxxxx>
Date: Mon, 9 Aug 2010 13:53:37 +0200
Cc: dolfin@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1281354430.1699.12.camel@gnw20pc>
User-agent: Mutt/1.5.20 (2009-06-14)

On Mon, Aug 09, 2010 at 12:47:10PM +0100, Garth N. Wells wrote:
> On Mon, 2010-08-09 at 13:37 +0200, Anders Logg wrote:
> > On Sat, Aug 07, 2010 at 01:24:44PM +0100, Garth N. Wells wrote:
> > > On Fri, 2010-08-06 at 19:55 +0100, Garth N. Wells wrote:
> > > > On Fri, 2010-08-06 at 20:53 +0200, Anders Logg wrote:
> > > > > On Fri, Aug 06, 2010 at 07:51:18PM +0100, Garth N. Wells wrote:
> > > > > > On Fri, 2010-08-06 at 20:36 +0200, Anders Logg wrote:
> > > > > > > On Fri, Aug 06, 2010 at 04:55:44PM +0100, Garth N. Wells wrote:
> > > > > > > > On Fri, 2010-08-06 at 08:42 -0700, Johan Hake wrote:
> > > > > > > > > On Friday August 6 2010 08:16:26 you wrote:
> > > > > > > > > > ------------------------------------------------------------
> > > > > > > > > > revno: 4896
> > > > > > > > > > committer: Garth N. Wells <gnw20@xxxxxxxxx>
> > > > > > > > > > branch nick: dolfin-all
> > > > > > > > > > timestamp: Fri 2010-08-06 16:13:29 +0100
> > > > > > > > > > message:
> > > > > > > > > >   Add simple Stokes solver for parallel testing.
> > > > > > > > > >
> > > > > > > > > >   Other Stokes demos don't run in parallel because MeshFunction io is not
> > > > > > > > > >   supported in parallel.
> > > > > > > > >
> > > > > > > > > Does anyone have an overview of what is needed for this to be fixed. I
> > > > > > > > > couldn't find a blueprint on it.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Here it is:
> > > > > > > >
> > > > > > > >     https://blueprints.launchpad.net/dolfin/+spec/parallel-io
> > > > > > > >
> > > > > > > > > I am interested in getting this fixed :)
> > > > > > > > >
> > > > > > > >
> > > > > > > > Me too! We need to look at all the io since much of it is broken in
> > > > > > > > parallel.
> > > > > > > >
> > > > > > > > We need to settle on how to handle XML data. I favour (and I know Niclas
> > > > > > > > Janson does too) the VTK approach in which we have a 'master file' that
> > > > > > > > points to other XML files which contain portions of the vector/mesh,
> > > > > > > > etc. Process zero can read the 'master file' and then instruct the other
> > > > > > > > processes on which file(s) they should read in.
> > > > > > >
> > > > > > > This only works if the data is already partitioned. Most of our demos
> > > > > > > assume that we have the mesh in one single file which is then
> > > > > > > partitioned on the fly.
> > > > > > >
> > > > > >
> > > > > > The approach does work for data which is not partitioned. Just like with
> > > > > > VTK, one can read the 'master file' or the individual files.
> > > > > >
> > > > > > > The initial plan was to support two different ways of reading data in parallel:
> > > > > > >
> > > > > > > 1. One file and automatic partitioning
> > > > > > >
> > > > > > > DOLFIN gets one file "mesh.xml", each process reads one part of it (just
> > > > > > > skipping other parts of the file), then the mesh is partitioned and
> > > > > > > redistributed.
> > > > > > >
> > > > > > > 2. Several files and no partitioning
> > > > > > >
> > > > > > > DOLFIN get multiple files and each process reads one part. In this
> > > > > > > case, the mesh and all associated data is already partitioned. This
> > > > > > > should be very easy to fix since everything that is needed is already
> > > > > > > in place; we just need to fix the logic. In particular, the data
> > > > > > > section of each local mesh contains all auxilliary parallel data.
> > > > > > >
> > > > > > > This can be handled in two different ways. Either a user specifies the
> > > > > > > name of the file as "mesh*.xml", in which case DOLFIN appends say
> > > > > > >
> > > > > > >   "_%d" % MPI::process_number()
> > > > > > >
> > > > > > > on each local process.
> > > > > > >
> > > > > > > The other way is to have a master file which lists all the other
> > > > > > > files. In this case, I don't see a need for process 0 to take any kind
> > > > > > > of responsibility for communicating file names. It would work fine for
> > > > > > > each process to read the master file and then check which file it
> > > > > > > should use. Each process could also check that the total number of
> > > > > > > processes matches the number of partitions in the file. We could let
> > > > > > > process 0 handle the parsing of the master file and then communicate
> > > > > > > the file names but maybe that is an extra complication.
> > > > > > >
> > > > > >
> > > > > > This fails when the number of files differs from the number of
> > > > > > processes. It's very important to support m files on n processes. We've
> > > > > > discussed this at length before.
> > > > >
> > > > > I don't remember. Can you remind me of what the reasons are?
> > > > >
> > > >
> > > > I perform a simulation using m processes, and write the result to m
> > > > files. Later I want to use the result later in another computation using
> > > > n processors.
> > > >
> > >
> > > I've looked a little into parallel io, and looked at what Trilinos and
> > > PETSc do. Both support HDF5, and HDF5 has been developed to work
> > > parallel. HDF5 does not advocate the one-file per processes (too awkward
> > > and complicated they say), but advocates a one file approach. It has
> > > tools that allow different processes to write to different parts of the
> > > same file in parallel.
> > >
> > > >From reading this, what I propose (for now) is:
> > >
> > > 1. We only ever write one XML file for a given object. This file can be
> > > read by different processes, with each reading in only a chunk.
> > >
> > > 2. We should add an XML format for partitioning data (Trilinos calls
> > > this a 'map'). If a map file is present, it is used to define the
> > > partitions. It may make sense to have a map file for each process (but
> > > no need for a 'master file').
> >
> > I suggest something slightly different. I'm ok with the one file
> > approach, but it would be good to store that data in a partitioned
> > way. Our current model for parallel computing is that each process has
> > a Mesh and each process has the partitioning data it needs stored in
> > the data section of the Mesh. So each process has just a regular mesh
> > with some auxilliary data attached to it. That makes it easy to read
> > and write using already existing code. (No need for a special parallel
> > format.)
> >
> > But we could easily throw all that data into one big file, something
> > like this:
> >
> > <distributed_mesh num_parts="16">
> >   <mesh ...>
> >     ...
> >   </mesh>
> >   <mesh ...>
> >     ...
> >   </mesh>
> >   ...
> > </distributed_mesh>
> >
>
> I would like to separate mesh and partitioning data. A partitioning of a
> given mesh is not unique to a mesh, so it should be separated. A
> partition could still go in the same XML file though.

It is separate from the mesh, but it is stored locally as part of the
Mesh in MeshData. I think this has proved to be a very good way
(efficient and simple) to store the data.

Or do you suggest that we store it differently on file than we do as
part of the DOLFIN data structures?

I'm not sure that's a good idea. I like the 1-1 correspondence between
our data structures and the file formats.

But I wouldn't mind an additional file format (like HDF5) if that
proves to be good for high-performace/very many processes.

> > > 3. For now, use native PETSc/Epetra HDF5 io for linear algebra objects
> > > in serious parallel computations.
> >
> > ok.
> >
> > > 4. In the future look into using parallel HDF5 to read/write meshes and
> > > other essential data.
> >
> > Perhaps, but I'm not sure how flexible HDF5 is to store and name all
> > the data we need.
> >
>
> An feature of HDF5 is that anything can be stored - we can just define
> tags, heirachies, etc. It's XML-like in that respect.

ok. Is it a candidate for replacing the XML I/O? Or just a complement
to it?

--
Anders

Attachment: signature.asc
Description: Digital signature

Follow ups

Re: [Branch ~dolfin-core/dolfin/main] Rev 4896: Add simple Stokes solver for parallel testing.
From: Garth N. Wells, 2010-08-09

References

Re: [Branch ~dolfin-core/dolfin/main] Rev 4896: Add simple Stokes solver for parallel testing.
From: Johan Hake, 2010-08-06
Re: [Branch ~dolfin-core/dolfin/main] Rev 4896: Add simple Stokes solver for parallel testing.
From: Garth N. Wells, 2010-08-06
Re: [Branch ~dolfin-core/dolfin/main] Rev 4896: Add simple Stokes solver for parallel testing.
From: Anders Logg, 2010-08-06
Re: [Branch ~dolfin-core/dolfin/main] Rev 4896: Add simple Stokes solver for parallel testing.
From: Garth N. Wells, 2010-08-06
Re: [Branch ~dolfin-core/dolfin/main] Rev 4896: Add simple Stokes solver for parallel testing.
From: Anders Logg, 2010-08-06
Re: [Branch ~dolfin-core/dolfin/main] Rev 4896: Add simple Stokes solver for parallel testing.
From: Garth N. Wells, 2010-08-06
Re: [Branch ~dolfin-core/dolfin/main] Rev 4896: Add simple Stokes solver for parallel testing.
From: Garth N. Wells, 2010-08-07
Re: [Branch ~dolfin-core/dolfin/main] Rev 4896: Add simple Stokes solver for parallel testing.
From: Anders Logg, 2010-08-09
Re: [Branch ~dolfin-core/dolfin/main] Rev 4896: Add simple Stokes solver for parallel testing.
From: Garth N. Wells, 2010-08-09