yade-users team mailing list archive
-
yade-users team
-
Mailing list archive
-
Message #26953
[Question #700315]: Open MPI - spawn processes
New question #700315 on Yade:
https://answers.launchpad.net/yade/+question/700315
Hi guys, I am new in OpenMPI so I will try to be as clear as possible here.
I instaled Yade 2021.01a on my cluster singularity/yade/yade_debian_bookwarm_1.0.sif.
I can run simulations using all cores I want. My cluster has 160 nodes, each node 80 cpu's.
So far so good.
I am now trying to run multiple nodes. For this, I am checking out this example [1].
When I run it I am getting the following message:
+ singularity run /beegfs/common/singularity/yade/yade_debian_bookwarm_1.0.sif yade -j5 Parallel.py
/usr/lib/x86_64-linux-gnu/yade/py/yade/__init__.py:76: RuntimeWarning: to-Python converter for boost::shared_ptr<yade::PartialSatClayEngine> already registered; second conversion method ignored.
boot.initialize(plugins,config.confDir)
TCP python prompt on localhost:9000, auth cookie `ksaeuc'
Welcome to Yade 2021.01a
Using python version: 3.9.7 (default, Sep 24 2021, 09:43:00)
[GCC 10.3.0]
Warning: no X rendering available (see https://bbs.archlinux.org/viewtopic.php?id=13189)
XMLRPC info provider on http://localhost:21000
Running script Parallel.py
Traceback (most recent call last):
File "/usr/bin/yade", line 343, in runScript
execfile(script,globals())
File "/usr/lib/python3/dist-packages/past/builtins/misc.py", line 87, in execfile
exec_(code, myglobals, mylocals)
File "Parallel.py", line 28, in <module>
mp.initialize(numMPIThreads)
File "/usr/lib/x86_64-linux-gnu/yade/py/yade/mpy.py", line 288, in initialize
comm_slave = MPI.COMM_WORLD.Spawn(yadeArgv[0], args=yadeArgv[1:],maxprocs=numThreads-process_count)
File "mpi4py/MPI/Comm.pyx", line 1534, in mpi4py.MPI.Intracomm.Spawn
mpi4py.MPI.Exception: MPI_ERR_SPAWN: could not spawn processes
[95mMaster: will spawn 9 workers running: /usr/bin/yade ['-j5', 'Parallel.py'] [0m
[[ ^L clears screen, ^U kills line. [1mF8[0m plot. ]]
In [1]: Do you really want to exit ([y]/n)?
I am not sure from where it is comming. Any idea?
This is how I am running it in my Batch:
#!/bin/bash -x
#SBATCH --nodes=2
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=80
#SBATCH --partition=compute
#SBATCH --job-name=DEM_PFV_Parallel
#SBATCH --time=10:00:00
singularity run /beegfs/common/singularity/yade/yade_debian_bookwarm_1.0.sif yade -j5 Case2_rotating_drum_mpi.py
PS. I am supposing that numMPIThreads = 10 in the python script is equal to nodes*-j (2*5 in this case).
[1]https://gitlab.com/yade-dev/trunk/-/blob/master/examples/DEM2020Benchmark/Case2_rotating_drum_mpi.py
--
You received this question notification because your team yade-users is
an answer contact for Yade.