On Wed, Aug 20, 2008 at 06:17:30PM +0200, Niclas Jansson wrote:
Stage 2 seems to involve a lot of communication, with small messages.
I think it would be more efficient if the stage were reorganized such
that all messages could be exchanged "at once", in a couple of larger
messages.
That would be nice. I'm very open to suggestions.
If understand the {T, S, F} overlap correctly, a facet could be globally
identified by the value of F(facet).
No, F(facet) would be the local number of the facet in subdomain S(facet).
If so, one suggestion is to buffer N_i and F(facet) in 0...p-1 buffers
(one for each processor) and exchange these during stage 2.
-- stage 1
for each facet f \in T
j = S_i(f)
if j > i
-- calculate dof N_i
buffer[S_i(f)].add(N_i)
buffer[S_i(f)].add(F_i(f))
end
end
-- stage 2
-- Exchange shared dofs with fancy MPI_Allgatherv or a lookalike
-- MPI_SendRecv loop.
for j = 1 to j = (num processors - 1)
src = (rank - j + num processors) % num processors
dest = (rank + j) % num processors
MPI_SendRecv(dest, buffer[dest], src, recv_buffer)
for i = 0 to sizeof(recv_buffer), i += 2
--update facet recv_buff(i+1) with dof value in recv_buff(i)
end
end
I didn't look at this in detail (yet). Is it still valid with the
above interpretation of F(facet)?
--
Anders