dolfin team mailing list archive
-
dolfin team
-
Mailing list archive
-
Message #24842
Re: buildbot failure in FEniCS Buildbot on dolfin-oneiric-i386
On 25 October 2011 11:18, Johannes Ring <johannr@xxxxxxxxx> wrote:
> On Mon, Oct 24, 2011 at 10:14 PM, Garth N. Wells <gnw20@xxxxxxxxx> wrote:
>> On 24 October 2011 12:57, Johannes Ring <johannr@xxxxxxxxx> wrote:
>>> This failure was expected and not Martin's fault.The problem is the
>>> stokes-iterative C++ demo, which is problematic when run in parallel
>>> (parallel testing has been turned off on this buildbot slave until
>>> now).
>>>
>>> I have done some manual testing with two and up to five processes and
>>> the demo fails only (but not always) when run with three or five
>>> processes. Sometimes I get a segmentation violation:
>>>
>>> [1]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>>> probably memory access out of range
>>> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>> [1]PETSC ERROR: or see
>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[1]PETSC
>>> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
>>> find memory corruption errors
>>> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
>>> [1]PETSC ERROR: to get more information on the crash.
>>> [1]PETSC ERROR: --------------------- Error Message
>>> ------------------------------------
>>> [1]PETSC ERROR: Signal received!
>>> [1]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>>
>>> Other times I get this error:
>>>
>>> Warning -- row partitioning does not line up! Partitioning incomplete!
>>> [2]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
>>> [2]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
>>> [2]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
>>> [2]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
>>> [2]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
>>> Process 2: Soln norm: 0
>>>
>>> Any ideas? Is it a bug in DOLFIN? The demo works fine in parallel when
>>> using Trilinos instead of PETSc.
>>>
>>
>> There is a very nasty bug in the Oneiric OpenMPI. I had a frustrating
>> week tracking this down.
>
> Is this a bug in OpenMPI 1.4.3 and is it reported somewhere?
No that I'm aware of. I tracked down an example of the bug in a SCOTCH
call to MPI_Allgather which randomly returned an obviously wrong
result.
> It would
> be good to fix this in Ubuntu (see below).
>
I'm not going to bother tracking it down in Ubuntu because I've
identified an MPI bug in Ubuntu in the past and it was confirmed, but
Ubuntu didn't bother releasing a fix. MPI is too specialised for them
to care.
Garth
>> Installing OpenMPI 1.4.4 manually does the trick.
>
> I would like to avoid that on the buildbot if possible, because it
> looses the value of having a Oneiric buildbot if most of the
> dependencies are built from source.
>
> Also, this does not solve the problem for the DOLFIN packages in the
> PPA as there is no chance I can build and maintain packages for
> OpenMPI 1.4.4 and all its dependencies.
>
> Johannes
>
Follow ups
References