← Back to team overview

dolfin team mailing list archive

Re: [Branch ~dolfin-core/dolfin/main] Rev 5631: Run parallel tests with 3 processes.

 

The la unit test time out on the buildbot when running in parallel
with 3 processes. This is the error I get when I run this test
manually:

fenicsslave@386:python$ mpirun -np 3 python test.py

Testing basic PyDOLFIN linear algebra operations
------------------------------------------------

Testing basic PyDOLFIN linear algebra operations
------------------------------------------------

Running: MTL4Tester

Testing basic PyDOLFIN linear algebra operations
------------------------------------------------

Running: MTL4Tester

Running: MTL4Tester
.........
Running: PETScTester
...
Running: PETScTester

Running: PETScTester
...[1]PETSC ERROR: [0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: --------------------- Error Message
------------------------------------
Nonconforming object sizes!
[1]PETSC ERROR: [0]PETSC ERROR: Nonconforming object sizes!
Mat mat,Vec y: local dim 5 6!
[1]PETSC ERROR: [0]PETSC ERROR: Mat mat,Vec y: local dim 6 5!
------------------------------------------------------------------------
[1]PETSC ERROR: [0]PETSC ERROR:
------------------------------------------------------------------------
Petsc Release Version 3.0.0, Patch 10, Tue Nov 24 16:38:09 CST 2009
[1]PETSC ERROR: [0]PETSC ERROR: Petsc Release Version 3.0.0, Patch 10,
Tue Nov 24 16:38:09 CST 2009
See docs/changes/index.html for recent updates.
[1]PETSC ERROR: [0]PETSC ERROR: See docs/changes/index.html for recent updates.
See docs/faq.html for hints about trouble shooting.
[1]PETSC ERROR: [0]PETSC ERROR: See docs/faq.html for hints about
trouble shooting.
See docs/index.html for manual pages.
[1]PETSC ERROR: [0]PETSC ERROR: See docs/index.html for manual pages.
------------------------------------------------------------------------
[1]PETSC ERROR: [0]PETSC ERROR:
------------------------------------------------------------------------
Unknown Name on a linux-gnu named 386 by fenicsslave Thu Feb  3 10:03:53 2011
[1]PETSC ERROR: [0]PETSC ERROR: Unknown Name on a linux-gnu named 386
by fenicsslave Thu Feb  3 10:03:53 2011
Libraries linked from /build/buildd/petsc-3.0.0.dfsg/linux-gnu-c-opt/lib
[1]PETSC ERROR: [0]PETSC ERROR: Libraries linked from
/build/buildd/petsc-3.0.0.dfsg/linux-gnu-c-opt/lib
Configure run at Thu Dec 31 09:53:16 2009
[1]PETSC ERROR: [0]PETSC ERROR: Configure run at Thu Dec 31 09:53:16 2009
Configure options --with-shared --with-debugging=0 --useThreads 0
--with-fortran-interfaces=1 --with-mpi-dir=/usr/lib/openmpi
--with-mpi-shared=1 --with-blas-lib=-lblas-3gf
--with-lapack-lib=-llapackgf-3 --with-umfpack=1
--with-umfpack-include=/usr/include/suitesparse
--with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]"
--with-superlu=1 --with-superlu-include=/usr/include/superlu
--with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1
--with-spooles-include=/usr/include/spooles
--with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1
--with-hypre-dir=/usr --with-scotch=1
--with-scotch-include=/usr/include/scotch
--with-scotch-lib=/usr/lib/libscotch.so
[1]PETSC ERROR: [0]PETSC ERROR: Configure options --with-shared
--with-debugging=0 --useThreads 0 --with-fortran-interfaces=1
--with-mpi-dir=/usr/lib/openmpi --with-mpi-shared=1
--with-blas-lib=-lblas-3gf --with-lapack-lib=-llapackgf-3
--with-umfpack=1 --with-umfpack-include=/usr/include/suitesparse
--with-umfpack-lib="[/usr/lib/libumfpack.so,/usr/lib/libamd.so]"
--with-superlu=1 --with-superlu-include=/usr/include/superlu
--with-superlu-lib=/usr/lib/libsuperlu.so --with-spooles=1
--with-spooles-include=/usr/include/spooles
--with-spooles-lib=/usr/lib/libspooles.so --with-hypre=1
--with-hypre-dir=/usr --with-scotch=1
--with-scotch-include=/usr/include/scotch
--with-scotch-lib=/usr/lib/libscotch.so
------------------------------------------------------------------------
[1]PETSC ERROR: [0]PETSC ERROR:
------------------------------------------------------------------------
MatMult() line 1774 in src/mat/interface/matrix.c
[1]PETSC ERROR: MatMult() line 1774 in src/mat/interface/matrix.c

It seems to be completely stuck at this point. Running with 4
processes result in the same error and behavior, while 2 processes
gives this message:

fenicsslave@386:python$ mpirun -np 2 python test.py

Testing basic PyDOLFIN linear algebra operations
------------------------------------------------

Running: MTL4Tester

Testing basic PyDOLFIN linear algebra operations
------------------------------------------------

Running: MTL4Tester
.......
Running: PETScTester
.
Running: PETScTester
........
Running: uBLASDenseTester

Running: uBLASDenseTester
........
Running: uBLASSparseTester

Running: uBLASSparseTester
........

--------------------------------------------------------------------------------------------------------------------------------------------
Ran 16 tests in 0.310s
Ran 16 tests in 0.308s



OKOK

--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 15559 on
node 386 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

Johannes

On Wed, Feb 2, 2011 at 3:32 PM,  <noreply@xxxxxxxxxxxxx> wrote:
> ------------------------------------------------------------
> revno: 5631
> committer: Garth N. Wells <gnw20@xxxxxxxxx>
> branch nick: dolfin-dev
> timestamp: Wed 2011-02-02 14:28:53 +0000
> message:
>  Run parallel tests with 3 processes.
>
>  This may well break the buildbot since when using simple meshes with an even number of elements, so issues with proper parallel layout are hidden.
> modified:
>  test/regression/test.py
>  test/unit/test.py
>
>
> --
> lp:dolfin
> https://code.launchpad.net/~dolfin-core/dolfin/main
>
> Your team DOLFIN Core Team is subscribed to branch lp:dolfin.
> To unsubscribe from this branch go to https://code.launchpad.net/~dolfin-core/dolfin/main/+edit-subscription
>
> === modified file 'test/regression/test.py'
> --- test/regression/test.py     2011-01-27 15:08:55 +0000
> +++ test/regression/test.py     2011-02-02 14:28:53 +0000
> @@ -79,7 +79,7 @@
>  # Build prefix list
>  prefixes = [""]
>  if "RUN_UNIT_TESTS_IN_PARALLEL" in os.environ and has_mpi() and has_parmetis():
> -    prefixes.append("mpirun -n 2 ")
> +    prefixes.append("mpirun -n 3 ")
>  else:
>     print "Not running regression tests in parallel."
>
>
> === modified file 'test/unit/test.py'
> --- test/unit/test.py   2011-01-17 18:19:40 +0000
> +++ test/unit/test.py   2011-02-02 14:28:53 +0000
> @@ -28,7 +28,7 @@
>  prefixes = [""]
>  if "RUN_TESTS_IN_PARALLEL" in os.environ:
>     if "RUN_TESTS_IN_PARALLEL" in os.environ and has_mpi() and has_parmetis():
> -        prefixes.append("mpirun -np 2 ")
> +        prefixes.append("mpirun -np 3 ")
>     else:
>         print "DOLFIN has not been compiled with MPI and/or ParMETIS. Unit tests will not be run in parallel."
>  else:
>
>
>



Follow ups