yade-mpi team mailing list archive
Mailing list archive
Re: deadlock fixed (?)
@François, now I understand why there is no deadlock (point 1/), thanks.
That was difficult for me to realize, Deepak helped. :)
About checkCollider and global barriers: *we definitely want to avoid any
The reason is: there is already a kind of barrier(*) at each iteration
since master has to receive forces (before Newton), and send back wall
positions (after Newton) (let's call "master sync" this sequence
Between two master syncs all workers should run at max speed without
waiting for another global event.
When we send positions at iteration N we know in each SD if collision
detection is needed at the begining of iteration N+1. It can be
communicated to master. Then, at least two options:
- master will tell everyone at the next master sync. In that case global
collision detection would be delayed by one iteration, it will occur at
N+2. That delay is technically perfectly fine since the SD which really
need immediate colliding will do it spontaneously at N+1 regardless of
global instructions. The downside of this approach is that if only one
subdomain is colliding at N+1, this SD will be slower and others will have
to wait for it to finish for the next master sync. Then collision detection
again at N+2, this would probably double the total cost of collision
- send yes/no to master in the "positions" stage (for the moment nothing is
sent to master in that step) + complete master sync with an additional
communication from master to workers.
Side question: what's the use of engine "waitForcesRunner"? I removed it
and it works just as well.
(*) It's only a partial barrier since some subdomains may not interact with
master, but we can change that to force all domains to send at least a
yes/no to master.
On Tue, 4 Jun 2019 at 16:41, François <francois.kneib@xxxxxxxxx> wrote:
> Concerning the non blocking MPI_ISend, using MPI_Wait was not necessary
>> with the use of a basic global barrier. I'm afraid that looping on send
>> requests and wait for them to complete can slow down the communications, as
>> you force (the send) order one more time (the receive order is already
>> forced here <https://gitlab.com/yade-dev/trunk/blob/mpi/py/mpy.py#L641>).
> ... but not using a global barrier allows the first threads that finished
> their sends/recvs to start the next DEM iteration before the others, +1 for
> your fix so finally I don't know what's better. Anyway that's probably not
> meaningful compared to the interaction loop timings.
> Mailing list: https://launchpad.net/~yade-mpi
> Post to : yade-mpi@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~yade-mpi
> More help : https://help.launchpad.net/ListHelp
ENSE³ - Grenoble INP
38041 Grenoble cedex 9
Tél : +33 4 56 52 86 21
Email too brief?
Here's why: email charter