← Back to team overview

dolfin team mailing list archive

Re: Parallelization and PXMLMesh

 

On Fri, Dec 19, 2008 at 11:54 AM, Anders Logg <logg@xxxxxxxxx> wrote:
> On Fri, Dec 19, 2008 at 09:23:43AM +0100, Martin Sandve Alnæs wrote:
>> On Fri, Dec 19, 2008 at 9:12 AM, Anders Logg <logg@xxxxxxxxx> wrote:
>> > On Fri, Dec 19, 2008 at 09:06:43AM +0100, Martin Sandve Alnæs wrote:
>> >> On Fri, Dec 19, 2008 at 8:54 AM, Anders Logg <logg@xxxxxxxxx> wrote:
>> >> > On Fri, Dec 19, 2008 at 12:09:50PM +0900, Evan Lezar wrote:
>> >> >>
>> >> >>
>> >> >> On Thu, Dec 18, 2008 at 4:25 AM, Johan Hake <hake@xxxxxxxxx> wrote:
>> >> >>
>> >> >>     On Wednesday 17 December 2008 20:19:52 Anders Logg wrote:
>> >> >>     > On Wed, Dec 17, 2008 at 08:13:03PM +0100, Johan Hake wrote:
>> >> >>     > > On Wednesday 17 December 2008 19:20:11 Anders Logg wrote:
>> >> >>     > > > Ola and I have now finished up the first round of getting DOLFIN to
>> >> >>     > > > run in parallel. In short, we can now parse meshes from file in
>> >> >>     > > > parallel and partition meshes in parallel (using ParMETIS).
>> >> >>     > > >
>> >> >>     > > > We reused some good ideas that Niclas Jansson had implemented in
>> >> >>     > > > PXMLMesh before, but have also made some significant changes as
>> >> >>     > > > follows:
>> >> >>     > > >
>> >> >>     > > > 1. The XML reader does not handle any partitioning.
>> >> >>     > > >
>> >> >>     > > > 2. The XML reader just reads in a chunk of the mesh data on each
>> >> >>     > > > processor (in parallel) and stores that into a LocalMeshData object
>> >> >>     > > > (one for each processor). The data is just partitioned in blocks so
>> >> >>     > > > the vertices and cells may be completely unrelated.
>> >> >>     > > >
>> >> >>     > > > 3. The partitioning takes place in MeshPartitioning::partition,
>> >> >>     > > > which gets a LocalMeshData object on each processor. It then calls
>> >> >>     > > > ParMETIS to compute a partition (in parallel) and then redistributes
>> >> >>     > > > the data accordingly. Finally, a mesh is built on each processor
>> >> >>     using
>> >> >>     > > > the local data.
>> >> >>     > > >
>> >> >>     > > > 4. All direct MPI calls (except one which should be removed) have
>> >> >>     been
>> >> >>     > > > removed from the code. Instead, we mostly rely on
>> >> >>     > > > dolfin::MPI::distribute which handles most cases of parallel
>> >> >>     > > > communication and works with STL data structures.
>> >> >>     > > >
>> >> >>     > > > 5. There is just one ParMETIS call (no initial geometric
>> >> >>     > > > partitioning). It seemed like an unnecessary step, or are there good
>> >> >>     > > > reasons to perform the partitioning in two steps?
>> >> >>     > > >
>> >> >>     > > > For testing, go to sandbox/passembly, build and then run
>> >> >>     > > >
>> >> >>     > > >   mpirun -n 4 ./demo
>> >> >>     > > >   ./plot_partitions 4
>> >> >>     > >
>> >> >>     > > Looks beautiful!
>> >> >>     > >
>> >> >>     > > I threw a 3D mesh of 160K vertices onto it, and it was partitioned
>> >> >>     nicely
>> >> >>     > > in some 10 s, on my 2 core laptop.
>> >> >>     > >
>> >> >>     > > Johan
>> >> >>     >
>> >> >>     > Nice, in particular since we haven't run any 3D test cases ourselves,
>> >> >>     > just a tiny mesh of the unit square... :-)
>> >> >>
>> >> >>     Yes I thought so too ;)
>> >> >>
>> >> >>     Johan
>> >> >>     _______________________________________________
>> >> >>     DOLFIN-dev mailing list
>> >> >>     DOLFIN-dev@xxxxxxxxxx
>> >> >>     http://www.fenics.org/mailman/listinfo/dolfin-dev
>> >> >>
>> >> >>
>> >> >> While we are on the topic of parallelisation - I have some comments.
>> >> >>
>> >> >> A couple of months ago I was trying to parallelise some of my own code and ran
>> >> >> into some problems with the communicators used for PETSc and SLEPc problems - I
>> >> >> sort of got them to work, but never commited my changes because they were a
>> >> >> little ungainly and I felt I needed to spend some more time on them.
>> >> >>
>> >> >> One thing that I did notice is that the user does not have much control over
>> >> >> which parts of the process run in parallel - with the number of MPI processes
>> >> >> deciding whether or not a parallel implementation should be used.  In my case
>> >> >> what I wanted to do was perform a frequency sweep and for each frequency point
>> >> >> perform the same calculation (an eigenvalue problem) for that frequency.  My
>> >> >> intention was to distribute the frequency sweep over a number of processors and
>> >> >> then handle the assembly and solution of each system separately.  This is not
>> >> >> possible as the code is now since the assembly and eigenvalue solvers all try
>> >> >> to run in parallel as soon as the applicaion is run as an mpi program.
>> >> >>
>> >> >> I know that this is not very descriptive, but does anyone else have thoughts on
>> >> >> the matter?  I will put together something a little more concrete as soon as I
>> >> >> have a chance (I am travelling around quite a bit at the moment so it is
>> >> >> difficult for me to focus).
>> >> >
>> >> > We're just getting started. Everything needs to be configurable. At
>> >> > the moment, we assume that all parallel computation should be split in
>> >> > MPI::num_processes().
>> >> >
>> >> > We can either add optional parameters to all parallel calls or add
>> >> > global options to control how many processes are used. But I suggest
>> >>
>> >> Global options wouldn't solve anything (the point here is variable
>> >> number of processes during the application lifetime), and it's probably
>> >> enough with a single parameter, the communicator, perhaps permanently
>> >> attached to each FunctionSpace (dofmap must be distributed over
>> >> the processes in a communicator to be usable). Calls to MPI::num_processes()
>> >> would be replaced by V.communicator().num_processes() or something.
>> >> I get the waiting argument though. When stuff works, a search for MPI::*
>> >> should reveal the places where "V.communicator()." should replace "MPI::",
>> >> likely covering most places that need to be updated.
>> >>
>> >> Martin
>> >
>> > Sounds good.
>> >
>> > Speaking of the communicator, I'd like to add access to a default
>> > communicator in the MPI class which could be accessed by
>> > MPI::communicator() but couldn't figure out how to do this (I suspect
>> > it's trivial).
>> >
>> > Then we could avoid including mpi.h in MeshPartitioning.cpp (look for
>> > the two first FIXMEs in that file). If anyone knows how to do this,
>> > let me know.
>>
>> You mean something like
>>
>> class MPI {
>> public:
>>
>>   static dolfin::Communicator & communicator()
>>   { return *mainCommunicator; }
>>
>> private:
>>   static shared_ptr<Communicator> mainCommunicator;
>> };
>>
>> ?
>>
>> This can then be initialized the first time MPI::communicator()
>> is called, to be a communicator including all MPI processes.
>>
>> Epetra has a "serial communicator", which is probably needed
>> as well when MPI is not in use, instead of adding #ifdefs all
>> places MPI is used.
>>
>> Martin
>
> Yes, something like that. But what would the Communicator class look
> like? If you know how to do this, feel free to just add it.
>
> --
> Anders

Basically, it looks like class MPI can be renamed to Communicator.
All its functions must be made non-static, and it should have a member
variable to hold the communicator ID. All occurences of MPI_COMM_WORLD
in MPI.cpp should be replaced with the communicator ID variable.
And all other places in the code where MPI_COMM_WORLD is needed,
a communicator must be used instead (i.e. distribute).

The communicator design will also affect the linear algebra backend,
depending on how those libraries do this... The persons who design
this should probably take a look at how both PETSc and Epetra
handle this to get ideas.

But I don't have time to really get involved with this now.
It's not like this is a simple thing to do on the side ;)

Martin


Follow ups

References