yade-mpi team mailing list archive

Thread
Date

Re: deadlock fixed (?)

From: Bruno Chareyre <bruno.chareyre@xxxxxxxxxxxxxxx>
Date: Thu, 6 Jun 2019 11:54:34 +0200
Cc: yade-mpi@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CANFfKpFZfiWpW6EtdAShjE01dAK0Co=EPZjHbBOO3moTiFwidw@mail.gmail.com>

I found a new problem I don't understand with "-ms". :(
It doesnt occur all the time, here is a way:
mpiexec --tag-output -n 3 ../../yade-mpi testMPI_2D_BUG_DK.py 50 50 -ms

The differences between the attached script and the one in trunk are:
loopOnSortedInteractions=True
MERGE_W_INTERACTIONS=True

With the trunk version the problem does not occur.

*[1,1]<stderr>:Running script testMPI_2D_BUG_DK.py[1,0]<stderr>:Running
script testMPI_2D_BUG_DK.py[1,2]<stderr>:Running script
testMPI_2D_BUG_DK.py[1,1]<stdout>:Worker1: triggers collider at iter
354[1,2]<stdout>:Worker2: triggers collider at iter 354[1,0]<stdout>:init
Done in  MASTER 0[1,2]<stdout>:Worker2: triggers collider at iter
501[1,1]<stdout>:Worker1: triggers collider at iter
501[1,2]<stderr>:Traceback (most recent call last):[1,1]<stderr>:Traceback
(most recent call last):[1,2]<stderr>:  File "../../yade-mpi", line 244, in
runScript[1,2]<stderr>:    execfile(script,globals())[1,2]<stderr>:  File
"testMPI_2D_BUG_DK.py", line 114, in <module>[1,2]<stderr>:
 mp.mpirun(NSTEPS)[1,2]<stderr>:  File
"/home/yade/lib/x86_64-linux-gnu/yade-mpi/py/yade/mpy.py", line 676, in
mpirun[1,2]<stderr>:    mergeScene()[1,1]<stderr>:  File "../../yade-mpi",
line 244, in runScript[1,2]<stderr>:  File
"/home/yade/lib/x86_64-linux-gnu/yade-mpi/py/yade/mpy.py", line 423, in
mergeScene[1,2]<stderr>:    O.subD.mergeOp()[1,2]<stderr>:RuntimeError:
vector::_M_default_append[1,1]<stderr>:
 execfile(script,globals())[1,1]<stderr>:  File "testMPI_2D_BUG_DK.py",
line 114, in <module>[1,1]<stderr>:    mp.mpirun(NSTEPS)[1,1]<stderr>:
 File "/home/yade/lib/x86_64-linux-gnu/yade-mpi/py/yade/mpy.py", line 676,
in mpirun[1,1]<stderr>:    mergeScene()[1,1]<stderr>:  File
"/home/yade/lib/x86_64-linux-gnu/yade-mpi/py/yade/mpy.py", line 423, in
mergeScene[1,1]<stderr>:    O.subD.mergeOp()[1,1]<stderr>:RuntimeError:
vector::_M_default_append--------------------------------------------------------------------------mpiexec
has exited due to process rank 2 with PID 19994 onnode dt-medXXX exiting
improperly. There are three reasons this could occur:1. this process did
not call "init" before exiting, but others inthe job did. This can cause a
job to hang indefinitely while it waitsfor all processes to call "init". By
rule, if one process calls "init",then ALL processes must call "init" prior
to termination.*

On Thu, 6 Jun 2019 at 11:17, Bruno Chareyre <bruno.chareyre@xxxxxxxxxxxxxxx>
wrote:

> Hi,
> @François, now I understand why there is no deadlock (point 1/), thanks.
> That was difficult for me to realize, Deepak helped. :)
>
> About checkCollider and global barriers: *we definitely want to avoid any
> barrier*.
> The reason is: there is already a kind of barrier(*) at each iteration
> since master has to receive forces (before Newton), and send back wall
> positions (after Newton) (let's call "master sync" this sequence
> forces+Newton+positions).
> Between two master syncs all workers should run at max speed without
> waiting for another global event.
> When we send positions at iteration N we know in each SD if collision
> detection is needed at the begining of iteration N+1. It can be
> communicated to master. Then, at least two options:
> - master will tell everyone at the next master sync. In that case global
> collision detection would be delayed by one iteration, it will occur at
> N+2. That delay is technically perfectly fine since the SD which really
> need immediate colliding will do it spontaneously at N+1 regardless of
> global instructions. The downside of this approach is that if only one
> subdomain is colliding at N+1, this SD will be slower and others will have
> to wait for it to finish for the next master sync. Then collision detection
> again at N+2, this would probably double the total cost of collision
> detection.
> - send yes/no to master in the "positions" stage (for the moment nothing
> is sent to master in that step) + complete master sync with an additional
> communication from master to workers.
>
> Side question: what's the use of engine "waitForcesRunner"? I removed it
> and it works just as well.
>
> (*) It's only a partial barrier since some subdomains may not interact
> with master, but we can change that to force all domains to send at least a
> yes/no to master.
>
> Bruno
>
>
>
>
> On Tue, 4 Jun 2019 at 16:41, François <francois.kneib@xxxxxxxxx> wrote:
>
>> Concerning the non blocking MPI_ISend, using MPI_Wait was not necessary
>>> with the use of a basic global barrier. I'm afraid that looping on send
>>> requests and wait for them to complete can slow down the communications, as
>>> you force (the send) order one more time (the receive order is already
>>> forced here <https://gitlab.com/yade-dev/trunk/blob/mpi/py/mpy.py#L641>
>>> ).
>>>
>> ... but not using a global barrier allows the first threads that finished
>> their sends/recvs to start the next DEM iteration before the others, +1 for
>> your fix so finally I don't know what's better. Anyway that's probably not
>> meaningful compared to the interaction loop timings.
>> --
>> Mailing list: https://launchpad.net/~yade-mpi
>> Post to     : yade-mpi@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~yade-mpi
>> More help   : https://help.launchpad.net/ListHelp
>>
>
>
> --
> --
> _______________
> Bruno Chareyre
> Associate Professor
> ENSE³ - Grenoble INP
> Lab. 3SR
> BP 53
> 38041 Grenoble cedex 9
> Tél : +33 4 56 52 86 21
> ________________
>
> Email too brief?
> Here's why: email charter
> <https://marcuselliott.co.uk/wp-content/uploads/2017/04/emailCharter.jpg>
>

-- 
-- 
_______________
Bruno Chareyre
Associate Professor
ENSE³ - Grenoble INP
Lab. 3SR
BP 53
38041 Grenoble cedex 9
Tél : +33 4 56 52 86 21
________________

Email too brief?
Here's why: email charter
<https://marcuselliott.co.uk/wp-content/uploads/2017/04/emailCharter.jpg>

# In order for mpy module to work, don't forget to make a symlink to yade executable named "yadeimport.py":
# ln -s path/to/yade/yade-version path/to/yade/yadeimport.py
#
# Possible executions of this script
### Parallel:
# mpiexec -n 4 yade-mpi -n -x testMPIxNxM.py
# mpiexec -n 4 yade-mpi  -n -x testMPIxN.py N M # (n-1) subdomains with NxM spheres each
### Monolithic:
# yade-mpi -n -x testMPIxN.py 
# yade-mpi -n -x testMPIxN.py N M
# yade-mpi -n -x testMPIxN.py N M n
# in last line the optional argument 'n' has the same meaning as with mpiexec, i.e. total number of bodies will be (n-1)*N*M but on single core
### Openmp:
# yade-mpi -j4 -n -x testMPIxN.py N M n
### Nexted MPI * OpenMP
# needs testing...
'''
This script simulates spheres falling on a plate using a distributed memory approach based on mpy module
The number of spheres assigned to one particular process (aka 'worker') is N*M, they form a regular patern.
The master process (rank=0) has no spheres assigned; it is in charge of getting the total force on the plate
The number of subdomains depends on argument 'n' of mpiexec. Since rank=0 is not assigned a regular subdomain the total number of spheres is (n-1)*N*M

'''

NSTEPS=1000 #turn it >0 to see time iterations, else only initilization TODO!HACK
#NSTEPS=50 #turn it >0 to see time iterations, else only initilization
N=50; M=50; #(columns, rows) per thread

if("-ms" in sys.argv):
	sys.argv.remove("-ms")
	mergeSplit=True
else: mergeSplit=False

if("-bc" in sys.argv):
	sys.argv.remove("-bc")
	bodyCopy=True
else: bodyCopy=False

#################
# Check MPI world
# This is to know if it was run with or without mpiexec (see preamble of this script)
import os
rank = os.getenv('OMPI_COMM_WORLD_RANK')
if rank is not None: #mpiexec was used
	rank=int(rank)
	numThreads=int(os.getenv('OMPI_COMM_WORLD_SIZE'))
else: #non-mpi execution, numThreads will still be used as multiplier for the problem size (2 => multiplier is 1)
	numThreads=2 if len(sys.argv)<4 else (int(sys.argv[3]))
	print "numThreads",numThreads
	
if len(sys.argv)>1: #we then assume N,M are provided as 1st and 2nd cmd line arguments
	N=int(sys.argv[1]); M=int(sys.argv[2])

############  Build a scene (we use Yade's pre-filled scene)  ############

# sequential grain colors
import colorsys
colorScale = (Vector3(colorsys.hsv_to_rgb(value*1.0/numThreads, 1, 1)) for value in range(0, numThreads))

#add spheres
for sd in range(0,numThreads-1):
	col = next(colorScale)
	ids=[]
	for i in range(N):#(numThreads-1) x N x M spheres, one thread is for master and will keep only the wall, others handle spheres
		for j in range(M):
			id = O.bodies.append(sphere((sd*N+i+j/30.,j,0),0.500,color=col)) #a small shift in x-positions of the rows to break symmetry
			ids.append(id)
	if rank is not None:# assigning subdomain!=0 in single thread would freeze the particles (Newton skips them)
		for id in ids: O.bodies[id].subdomain = sd+1

WALL_ID=O.bodies.append(box(center=(numThreads*N*0.5,-0.5,0),extents=(2*numThreads*N,0,2),fixed=True))
interactionLoop.loopOnSortedInteractions=True

collider.verletDist = 0.5
newton.gravity=(0,-10,0) #else nothing would move
tsIdx=O.engines.index(timeStepper) #remove the automatic timestepper. Very important: we don't want subdomains to use many different timesteps...
O.engines=O.engines[0:tsIdx]+O.engines[tsIdx+1:]
O.dt=0.001 #this very small timestep will make it possible to run 2000 iter without merging
#O.dt=0.1*PWaveTimeStep() #very important, we don't want subdomains to use many different timesteps...


#########  RUN  ##########
def collectTiming():
	created = os.path.isfile("collect.dat")
	f=open('collect.dat','a')
	if not created: f.write("numThreads mpi omp Nspheres N M runtime \n")
	from yade import timing
	f.write(str(numThreads)+" "+str(os.getenv('OMPI_COMM_WORLD_SIZE'))+" "+os.getenv('OMP_NUM_THREADS')+" "+str(N*M*(numThreads-1))+" "+str(N)+" "+str(M)+" "+str(timing.runtime())+"\n")
	f.close()


if rank is None: #######  Single-core  ######
	O.timingEnabled=True
	O.run(NSTEPS,True)
	#print "num bodies:",len(O.bodies)
	from yade import timing
	timing.stats()
	collectTiming()
	print "num. bodies:",len([b for b in O.bodies]),len(O.bodies)
	print "Total force on floor=",O.forces.f(WALL_ID)[1]
else: #######  MPI  ######
	#import yade's mpi module
	from yade import mpy as mp
	# customize
	mp.ACCUMULATE_FORCES=True #trigger force summation on master's body (here WALL_ID)
	mp.VERBOSE_OUTPUT=False
	mp.ERASE_REMOTE=True #erase bodies not interacting wit a given subdomain?
	mp.OPTIMIZE_COM=True #L1-optimization: pass a list of double instead of a list of states
	mp.USE_CPP_MPI=True and mp.OPTIMIZE_COM #L2-optimization: workaround python by passing a vector<double> at the c++ level
	mp.MERGE_W_INTERACTIONS=True
	mp.MERGE_SPLIT=mergeSplit
	mp.COPY_MIRROR_BODIES_WHEN_COLLIDE = bodyCopy and not mergeSplit

	mp.mpirun(NSTEPS)
	print "num. bodies:",len([b for b in O.bodies]),len(O.bodies)
	if rank==0:
		mp.mprint( "Total force on floor="+str(O.forces.f(WALL_ID)[1]))
		collectTiming()
	else: mp.mprint( "Partial force on floor="+str(O.forces.f(WALL_ID)[1]))
	mp.mergeScene()
	if rank==0: O.save('mergedScene.yade')
	mp.MPI.Finalize()
exit()

Follow ups

Re: deadlock fixed (?)
From: Deepak Kn, 2019-06-06

References

deadlock fixed (?)
From: Bruno Chareyre, 2019-05-30
Re: deadlock fixed (?)
From: Deepak Kn, 2019-05-31
Re: deadlock fixed (?)
From: Bruno Chareyre, 2019-06-01
Re: deadlock fixed (?)
From: François, 2019-06-04
Re: deadlock fixed (?)
From: François, 2019-06-04
Re: deadlock fixed (?)
From: François, 2019-06-04
Re: deadlock fixed (?)
From: Bruno Chareyre, 2019-06-06