← Back to team overview

dolfin team mailing list archive

Re: buildbot failure in FEniCS Buildbot on dolfin-maverick-i386

 

Hi,

The buildbot seems to fail randomly when running the
parallel-assembly-solve system test:

Running tests: system
----------------------------------------------------------------------
Running system test: parallel-assembly-solve
----------------------------------------------------------------------
*** Failed
Process 0: Number of global vertices: 289
Process 0: Number of global cells: 512
Process 1: Partitioned mesh, edge cut is 30.
Process 2: Partitioned mesh, edge cut is 30.
Process 0: Partitioned mesh, edge cut is 30.
Process 2: Partitioned mesh, edge cut is 59.
Process 1: Partitioned mesh, edge cut is 59.
Process 0: Number of global vertices: 125
Process 0: Number of global cells: 384
Process 0: Partitioned mesh, edge cut is 59.
Process 0: Calling DOLFIN just-in-time (JIT) compiler, this may take some time.
Process 0: Calling DOLFIN just-in-time (JIT) compiler, this may take some time.
Degree: 1
Degree: 1
Degree: 1
Process 0: Calling FFC just-in-time (JIT) compiler, this may take some time.
Process 1: Solving linear variational problem.
Process 2: Solving linear variational problem.
Process 0: Solving linear variational problem.
[buildmaster:15343] *** An error occurred in MPI_Bcast
[buildmaster:15343] *** on communicator MPI COMMUNICATOR 5 DUP FROM 3
[buildmaster:15343] *** MPI_ERR_TRUNCATE: message truncated
[buildmaster:15343] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 15343 on
node buildmaster exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

Forcing a rebuild usually helps, however, this problem is not only
affecting the system test. I see this error (or similar error about
MPI finalize) when running any tests or demos in parallel. It works
fine in 0.9.9 but it is broken in 0.9.10 and dolfin-dev. Is there an
easy fix?

This problem was not detected by the buildbot because the unit test
script only checks if 'OK' was in the output from running the test,
but it does not check the exit status.

Johannes

On Mon, Mar 21, 2011 at 5:10 AM,  <buildbot@xxxxxxxxxx> wrote:
> The Buildbot has detected a new failure of dolfin-maverick-i386 on FEniCS Buildbot.
> Full details are available at:
>  http://fenicsproject.org:8080/builders/dolfin-maverick-i386/builds/146
>
> Buildbot URL: http://fenicsproject.org:8080/
>
> Buildslave for this Build: maverick-i386
>
> Build Reason:
> Build Source Stamp: HEAD
> Blamelist:
>
> BUILD FAILED: failed dolfin check
>
> sincerely,
>  -The Buildbot
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dolfin
> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dolfin
> More help   : https://help.launchpad.net/ListHelp
>



References