← Back to team overview

dolfin team mailing list archive

Re: Release and buildbot status

 

On Mon, Apr 06, 2009 at 03:39:13PM +0200, Johannes Ring wrote:
> On Mon, April 6, 2009 15:14, Anders Logg wrote:
> > On Mon, Apr 06, 2009 at 01:41:59PM +0200, Johannes Ring wrote:
> >> On Mon, April 6, 2009 13:35, Anders Logg wrote:
> >> > The buildbot is still not happy on all platforms. The MPI fixes by
> >> > Johan seem to have helped some but there are still problems.
> >> >
> >> > For mac-osx, the tests seem to hang when running the submesh demo:
> >> >
> >> >   command timed out: 1200 seconds without output, killing pid 12857
> >> >   process killed by signal 9
> >> >   program finished with exit code -1
> >>
> >> I think the macbot has crashed. I will try to restart it.
> 
> It didn't help with a restart. It was hanging on the submesh demo again.
> The output contains an PETSc error. I guess it's related with the mpi fix:
> 
> Updating mesh coordinates using transfinite mean value interpolation
> (Hermite).
> Plotting mesh (DOLFIN mesh), press 'q' to continue...
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 10 BUS: Bus Error, possibly illegal
> memory access
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC
> ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to
> find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
> [0]PETSC ERROR:       is given.
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Signal received!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 2.3.3, Patch 15, Tue Sep 23 10:02:49
> CDT 2008 HG revision: 31306062cd1a6f6a2496fccb4878f485c9b91760
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Unknown Name on a darwin9.2 named Amira-iMac.local by
> fenicsslave Mon Apr  6 15:13:16 2009
> [0]PETSC ERROR: Libraries linked from
> /usr/local/src/petsc-2.3.3-p15/lib/darwin9.2.2-cxx-debug
> [0]PETSC ERROR: Configure run at Thu Mar  5 10:40:39 2009
> [0]PETSC ERROR: Configure options --with-clanguage=cxx --with-shared=1
> --with-x=0 --with-x11=0 --with-fortran=0
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> [Amira-iMac.local:09565] MPI_ABORT invoked on rank 0 in communicator
> MPI_COMM_WORLD with errorcode 59
> 
> ^C
> Program received signal SIGINT, Interrupt.
> 0x963b54ba in wait4 ()
> (gdb) where
> #0  0x963b54ba in wait4 ()
> #1  0x963b3007 in system$UNIX2003 ()
> #2  0x00eb3fd2 in plot_object<dolfin::Mesh> (t=@0xbfffeb7c,
> mode=@0xbfffea58) at dolfin/plot/plot.cpp:35
> #3  0x00eb2d30 in std::string::_M_rep () at basic_string.h:48
> #4  0x00eb2d30 in ~basic_string [inlined] () at dolfin/plot/plot.cpp:472
> #5  ~basic_string [inlined] () at basic_string.h:472
> #6  dolfin::plot (mesh=@0xbfffeb7c) at dolfin/plot/plot.cpp:48
> #7  0x00003323 in main () at demo/mesh/submesh/cpp/main.cpp:53
> (gdb)
> 
> 
> >> > For linux64-exp, most tests seem to fail.
> >>
> >> We thought that our fixes would make this slave green but now we got a
> >> strange error:
> >>
> >>  ./../../demo/fem/simple/python (Python)
> >>
> >> Traceback (most recent call last):
> >>   File "./demo.py", line 19, in <module>
> >>     V = FunctionSpace(mesh, "CG", 1)
> >>   File
> >> "/work/jhbuildbot/fenics/lib/python2.5/site-packages/dolfin/functionspace.py",
> >> line 184, in __init__
> >>     FunctionSpaceBase.__init__(self, mesh, element)
> >>   File
> >> "/work/jhbuildbot/fenics/lib/python2.5/site-packages/dolfin/functionspace.py",
> >> line 46, in __init__
> >>     ufc_element, ufc_dofmap = jit(self._element)
> >>   File
> >> "/work/jhbuildbot/fenics/lib/python2.5/site-packages/dolfin/jit.py",
> >> line 30, in jit
> >>     if not check_swig_version(cpp.__swigversion__,same=True):
> >>   File
> >> "/work/jhbuildbot/fenics/lib/python2.5/site-packages/instant/config.py",
> >> line 35, in check_swig_version
> >>     installed_version = map(int, get_swig_version().split('.'))
> >>   File
> >> "/work/jhbuildbot/fenics/lib/python2.5/site-packages/instant/config.py",
> >> line 14, in get_swig_version
> >>     r = re.search(pattern, output)
> >>   File "/usr/lib/python2.5/re.py", line 142, in search
> >>     return _compile(pattern, flags).search(string)
> >> TypeError: expected string or buffer
> >>
> >> Johannes
> >
> > This error appears when the second argument to re.search is wrong.
> >
> > So the output variable returned from Instant's get_status_output
> > is wrong.
> 
> Yes, the output variable was None. It is strangely related with todays mpi
> fixes. When I reverted the changes the output variable contained the
> correct string.
> 
> > Why does Instant require its own get_status_output instead of using
> > commands.getstatusoutput?
> 
> Because commands.getstatusoutput is not available on Windows.

I see.

As far as I understand, the problems we see now are related to some
intricate problems loading MPI in Python on Mac with Trilinos.

Is this combination (MPI + Mac + Trilinos) common enough that we need
to solve it right now? I only use one of the three myself (MPI). Would
it be an option to for example disable MPI support when building for
Trilinos on Mac?

There are quite a few items in the queue both for DOLFIN and FFC that
will break both functionality and interfaces and it would be good to
release now so we can get started.

-- 
Anders

Attachment: signature.asc
Description: Digital signature


Follow ups

References