← Back to team overview

launchpad-dev team mailing list archive

ec2 hangs

 

Hi people,

Today both mwhudson and I had hangs on ec2 where the windmill tests appeared 
to fail, and then the tests hung.  The log files didn't contain complete stack 
traces, and when we shelled into the instances their load was zero, and using 
mwhudson's python stack queryer it seems it was just waiting for connections.

Apart from the obvious problems of run-away costs on EC2, we have the WTF 
moment of having no idea what is happening.

Tim

This process was process running some windmill tests:

/usr/bin/python2.5 /var/launchpad/test/bin/test --resume-layer 
lp.code.windmill.testing.CodeWindmillLayer 12 --subunit -vv 

ec2test@ip-10-195-162-31:~/pygdb$ python backtrace.py 14620
Thread 3
#0 0x00002b8523165dc2 in select () from None
#1 0x00002b8527a402c3 in select_select (self=<value optimized out>, 
args=<value optimized out>) from 
/build/buildd/python2.5-2.5.2/Modules/selectmodule.c
/usr/lib/python2.5/asyncore.py (104): poll
/usr/lib/python2.5/asyncore.py (181): loop
/var/launchpad/tmp/eggs/lazr.smtptest-1.1-py2.5.egg/lazr/smtptest/server.py 
(107): start
/usr/lib/python2.5/threading.py (445): run
/usr/lib/python2.5/threading.py (469): __bootstrap_inner
/usr/lib/python2.5/threading.py (461): __bootstrap

Thread 2
#0 0x00002b85227fd7fb in accept () from None
#1 0x00002b852388f947 in sock_accept (s=0x94409c0) from 
/build/buildd/python2.5-2.5.2/Modules/socketmodule.c
/usr/lib/python2.5/socket.py (167): accept
/usr/lib/python2.5/SocketServer.py (374): get_request
/usr/lib/python2.5/SocketServer.py (216): handle_request
/var/launchpad/tmp/eggs/windmill-1.3beta3_lp_r1440-
py2.5.egg/windmill/server/https.py (394): start
/usr/lib/python2.5/threading.py (445): run
/usr/lib/python2.5/threading.py (469): __bootstrap_inner
/usr/lib/python2.5/threading.py (461): __bootstrap

Thread 1
#0 0x00002b85227fc991 in sem_wait () from None
#1 0x00000000004b371d in PyThread_acquire_lock (lock=0xc220e90, waitflag=1) 
from ../Python/thread_pthread.h
#2 0x00000000004b68d0 in lock_PyThread_acquire_lock (self=0x11de38d0, 
args=<value optimized out>) from ../Modules/threadmodule.c
/usr/lib/python2.5/threading.py (208): wait
/usr/lib/python2.5/threading.py (580): join
/usr/lib/python2.5/threading.py (682): _exitfunc


And this one was the app server:
/usr/bin/python2.5 -S /var/launchpad/test/bin/run -C configs/testrunner-
appserver/launchpad.conf


ec2test@ip-10-195-162-31:~/pygdb$ python backtrace.py 14654
Thread 6
#0 0x00002b60acf52dc2 in select () from None
#1 0x00002b60ad1e8784 in time_sleep (self=<value optimized out>, args=<value 
optimized out>) from /build/buildd/python2.5-2.5.2/Modules/timemodule.c
/var/launchpad/tmp/eggs/zope.sendmail-3.7.1-py2.5.egg/zope/sendmail/queue.py 
(155): run
/usr/lib/python2.5/threading.py (469): __bootstrap_inner
/usr/lib/python2.5/threading.py (461): __bootstrap

Thread 5
#0 0x00002b60ac5e9991 in sem_wait () from None
#1 0x00000000004b371d in PyThread_acquire_lock (lock=0xab3d000, waitflag=1) 
from ../Python/thread_pthread.h
#2 0x00000000004b68d0 in lock_PyThread_acquire_lock (self=0x954ca38, 
args=<value optimized out>) from ../Modules/threadmodule.c
/usr/lib/python2.5/threading.py (208): wait
/usr/lib/python2.5/Queue.py (158): get
/var/launchpad/tmp/eggs/zope.server-3.6.1-py2.5.egg/zope/server/taskthreads.py 
(40): handlerThread

Thread 4
#0 0x00002b60ac5e9991 in sem_wait () from None
#1 0x00000000004b371d in PyThread_acquire_lock (lock=0x938ffe0, waitflag=1) from 
../Python/thread_pthread.h
#2 0x00000000004b68d0 in lock_PyThread_acquire_lock (self=0x954cbd0, 
args=<value optimized out>) from ../Modules/threadmodule.c
/usr/lib/python2.5/threading.py (208): wait
/usr/lib/python2.5/Queue.py (158): get
/var/launchpad/tmp/eggs/zope.server-3.6.1-py2.5.egg/zope/server/taskthreads.py 
(40): handlerThread

Thread 3
#0 0x00002b60ac5e9991 in sem_wait () from None
#1 0x00000000004b371d in PyThread_acquire_lock (lock=0x211f9d0, waitflag=1) 
from ../Python/thread_pthread.h
#2 0x00000000004b68d0 in lock_PyThread_acquire_lock (self=0x2693dc8, 
args=<value optimized out>) from ../Modules/threadmodule.c
/usr/lib/python2.5/threading.py (208): wait
/usr/lib/python2.5/Queue.py (158): get
/var/launchpad/tmp/eggs/zope.server-3.6.1-py2.5.egg/zope/server/taskthreads.py 
(40): handlerThread

Thread 2
#0 0x00002b60ac5e9991 in sem_wait () from None
#1 0x00000000004b371d in PyThread_acquire_lock (lock=0xaaebf40, waitflag=1) 
from ../Python/thread_pthread.h
#2 0x00000000004b68d0 in lock_PyThread_acquire_lock (self=0x954c0d8, 
args=<value optimized out>) from ../Modules/threadmodule.c
/usr/lib/python2.5/threading.py (208): wait
/usr/lib/python2.5/Queue.py (158): get
/var/launchpad/tmp/eggs/zope.server-3.6.1-py2.5.egg/zope/server/taskthreads.py 
(40): handlerThread

Thread 1
#0 0x00002b60acf52dc2 in select () from None
#1 0x00002b60b30942c3 in select_select (self=<value optimized out>, 
args=<value optimized out>) from 
/build/buildd/python2.5-2.5.2/Modules/selectmodule.c
/usr/lib/python2.5/asyncore.py (104): poll
/var/launchpad/tmp/eggs/zope.app.server-3.4.2-
py2.5.egg/zope/app/server/main.py (80): run
/var/launchpad/tmp/eggs/zope.app.server-3.4.2-
py2.5.egg/zope/app/server/main.py (53): main
/var/launchpad/test/lib/canonical/launchpad/scripts/runlaunchpad.py (264): 
start_launchpad
/var/launchpad/test/bin/run (3): <module>




Follow ups