dolfin team mailing list archive
-
dolfin team
-
Mailing list archive
-
Message #24843
Re: buildbot failure in FEniCS Buildbot on dolfin-oneiric-i386
On Tue, Oct 25, 2011 at 12:28 PM, Garth N. Wells <gnw20@xxxxxxxxx> wrote:
> On 25 October 2011 11:18, Johannes Ring <johannr@xxxxxxxxx> wrote:
>> On Mon, Oct 24, 2011 at 10:14 PM, Garth N. Wells <gnw20@xxxxxxxxx> wrote:
>>> On 24 October 2011 12:57, Johannes Ring <johannr@xxxxxxxxx> wrote:
>>>> This failure was expected and not Martin's fault.The problem is the
>>>> stokes-iterative C++ demo, which is problematic when run in parallel
>>>> (parallel testing has been turned off on this buildbot slave until
>>>> now).
>>>>
>>>> I have done some manual testing with two and up to five processes and
>>>> the demo fails only (but not always) when run with three or five
>>>> processes. Sometimes I get a segmentation violation:
>>>>
>>>> [1]PETSC ERROR:
>>>> ------------------------------------------------------------------------
>>>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>>>> probably memory access out of range
>>>> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>> [1]PETSC ERROR: or see
>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[1]PETSC
>>>> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
>>>> find memory corruption errors
>>>> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
>>>> [1]PETSC ERROR: to get more information on the crash.
>>>> [1]PETSC ERROR: --------------------- Error Message
>>>> ------------------------------------
>>>> [1]PETSC ERROR: Signal received!
>>>> [1]PETSC ERROR:
>>>> ------------------------------------------------------------------------
>>>>
>>>> Other times I get this error:
>>>>
>>>> Warning -- row partitioning does not line up! Partitioning incomplete!
>>>> [2]PETSC ERROR: MatHYPRE_IJMatrixCreate() line 76 in src/dm/da/utils/mhyp.c
>>>> [2]PETSC ERROR: PCSetUp_HYPRE() line 112 in src/ksp/pc/impls/hypre/hypre.c
>>>> [2]PETSC ERROR: PCSetUp() line 795 in src/ksp/pc/interface/precon.c
>>>> [2]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c
>>>> [2]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c
>>>> Process 2: Soln norm: 0
>>>>
>>>> Any ideas? Is it a bug in DOLFIN? The demo works fine in parallel when
>>>> using Trilinos instead of PETSc.
>>>>
>>>
>>> There is a very nasty bug in the Oneiric OpenMPI. I had a frustrating
>>> week tracking this down.
>>
>> Is this a bug in OpenMPI 1.4.3 and is it reported somewhere?
>
> No that I'm aware of. I tracked down an example of the bug in a SCOTCH
> call to MPI_Allgather which randomly returned an obviously wrong
> result.
Ok.
>> It would
>> be good to fix this in Ubuntu (see below).
>>
>
> I'm not going to bother tracking it down in Ubuntu because I've
> identified an MPI bug in Ubuntu in the past and it was confirmed, but
> Ubuntu didn't bother releasing a fix. MPI is too specialised for them
> to care.
Ok, I see. I will turn off parallel testing on this buildbot then.
Johannes
References