On Wed, Aug 20, 2008 at 05:09:41PM +0200, Niclas Jansson wrote:
Anders Logg wrote:
On Wed, Aug 20, 2008 at 09:39:30AM +0200, Niclas Jansson wrote:
Anders Logg wrote:
On Mon, Aug 18, 2008 at 11:05:30AM +0200, Niclas Jansson wrote:
Anders Logg wrote:
I think it looks good.
As far as I understand, you build a global numbering of all mesh
entities (which may be different from the local numbering on each
processor), and then the (global parallel) local-to-global mapping
follows from tabulate_dofs just as usual.
So, the difference is that you build a global numbering of the mesh
entities, and we wanted to build a global numbering of the dofs. The
only advantage I can see with our approach is that it may use less
memory, since we don't need to store an extra numbering scheme for all
mesh entities but this not a big deal.
A few questions:
1. Is the above interpretation correct?
Yes.
Another disadvantage with the global numbering scheme is the mesh
connectivity calculations (mesh.init in MeshRenumber).
Why is this a problem? As far as I understand, there are always two
different numberings of mesh entities, one local (same as we have
now) and one global. The local can be computed as usual and then the
global can be reconstructed from the local + the overlap.
(overlap = how the local pieces fit together)
Iterating over the local + overlap requires some mesh connectivity,
which are costly to generate.
What's your point? Are you arguing against a global numbering scheme
for the mesh entities? I thought this is what you have implemented.
I'm not sure if the global numbering scheme is the best approach. It
worked well for simple dof_maps / elements, with a low renumbering time.
But for a more general implementation, renumbering starts to take too
much time.
ok. So do you suggest we implement the other strategy instead, building a
global dof map from local dof maps?
Yes, it's probably more efficient. The only problem with algorithm 5 (in
my opinion) is the communication pattern in stage 0 and Stage 2.
Parallel efficiency in stage 0 would probably be low due to the pipeline
styled offset calculation, it should be easy to fix with MPI_(Ex)Scan.
The plan is to use MPI_Scan for this. The "offset += " is just my
notation for the same operation. I wasn't aware of MPI_Scan at the
time.
Stage 2 seems to involve a lot of communication, with small messages.
I think it would be more efficient if the stage were reorganized such
that all messages could be exchanged "at once", in a couple of larger
messages.
That would be nice. I'm very open to suggestions.
--
Anders