← Back to team overview

dolfin team mailing list archive

Re: buildbot failure in FEniCS Buildbot on dolfin-oneiric-i386

 

On Mon, Oct 24, 2011 at 10:14 PM, Garth N. Wells <gnw20@xxxxxxxxx> wrote:
> On 24 October 2011 12:57, Johannes Ring <johannr@xxxxxxxxx> wrote:
>> This failure was expected and not Martin's fault.The problem is the
>> stokes-iterative C++ demo, which is problematic when run in parallel
>> (parallel testing has been turned off on this buildbot slave until
>> now).
>>
>> I have done some manual testing with two and up to five processes and
>> the demo fails only (but not always) when run with three or five
>> processes. Sometimes I get a segmentation violation:
>>
>> [1]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>> probably memory access out of range
>> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [1]PETSC ERROR: or see
>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[1]PETSC
>> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
>> find memory corruption errors
>> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
>> [1]PETSC ERROR: to get more information on the crash.
>> [1]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> [1]PETSC ERROR: Signal received!
>> [1]PETSC ERROR:
>> ------------------------------------------------------------------------
>>
>> Other times I get this error:
>>
>> Warning -- row partitioning does not line up! Partitioning incomplete!
>> [2]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
>> [2]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
>> [2]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
>> [2]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
>> [2]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
>> Process 2: Soln norm: 0
>>
>> Any ideas? Is it a bug in DOLFIN? The demo works fine in parallel when
>> using Trilinos instead of PETSc.
>>
>
> There is a very nasty  bug in the Oneiric OpenMPI. I had a frustrating
> week tracking this down.

Is this a bug in OpenMPI 1.4.3 and is it reported somewhere? It would
be good to fix this in Ubuntu (see below).

> Installing OpenMPI 1.4.4 manually does the trick.

I would like to avoid that on the buildbot if possible, because it
looses the value of having a Oneiric buildbot if most of the
dependencies are built from source.

Also, this does not solve the problem for the DOLFIN packages in the
PPA as there is no chance I can build and maintain packages for
OpenMPI 1.4.4 and all its dependencies.

Johannes


Follow ups

References