launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #03237
ec2 hangs
Hi people,
Today both mwhudson and I had hangs on ec2 where the windmill tests appeared
to fail, and then the tests hung. The log files didn't contain complete stack
traces, and when we shelled into the instances their load was zero, and using
mwhudson's python stack queryer it seems it was just waiting for connections.
Apart from the obvious problems of run-away costs on EC2, we have the WTF
moment of having no idea what is happening.
Tim
This process was process running some windmill tests:
/usr/bin/python2.5 /var/launchpad/test/bin/test --resume-layer
lp.code.windmill.testing.CodeWindmillLayer 12 --subunit -vv
ec2test@ip-10-195-162-31:~/pygdb$ python backtrace.py 14620
Thread 3
#0 0x00002b8523165dc2 in select () from None
#1 0x00002b8527a402c3 in select_select (self=<value optimized out>,
args=<value optimized out>) from
/build/buildd/python2.5-2.5.2/Modules/selectmodule.c
/usr/lib/python2.5/asyncore.py (104): poll
/usr/lib/python2.5/asyncore.py (181): loop
/var/launchpad/tmp/eggs/lazr.smtptest-1.1-py2.5.egg/lazr/smtptest/server.py
(107): start
/usr/lib/python2.5/threading.py (445): run
/usr/lib/python2.5/threading.py (469): __bootstrap_inner
/usr/lib/python2.5/threading.py (461): __bootstrap
Thread 2
#0 0x00002b85227fd7fb in accept () from None
#1 0x00002b852388f947 in sock_accept (s=0x94409c0) from
/build/buildd/python2.5-2.5.2/Modules/socketmodule.c
/usr/lib/python2.5/socket.py (167): accept
/usr/lib/python2.5/SocketServer.py (374): get_request
/usr/lib/python2.5/SocketServer.py (216): handle_request
/var/launchpad/tmp/eggs/windmill-1.3beta3_lp_r1440-
py2.5.egg/windmill/server/https.py (394): start
/usr/lib/python2.5/threading.py (445): run
/usr/lib/python2.5/threading.py (469): __bootstrap_inner
/usr/lib/python2.5/threading.py (461): __bootstrap
Thread 1
#0 0x00002b85227fc991 in sem_wait () from None
#1 0x00000000004b371d in PyThread_acquire_lock (lock=0xc220e90, waitflag=1)
from ../Python/thread_pthread.h
#2 0x00000000004b68d0 in lock_PyThread_acquire_lock (self=0x11de38d0,
args=<value optimized out>) from ../Modules/threadmodule.c
/usr/lib/python2.5/threading.py (208): wait
/usr/lib/python2.5/threading.py (580): join
/usr/lib/python2.5/threading.py (682): _exitfunc
And this one was the app server:
/usr/bin/python2.5 -S /var/launchpad/test/bin/run -C configs/testrunner-
appserver/launchpad.conf
ec2test@ip-10-195-162-31:~/pygdb$ python backtrace.py 14654
Thread 6
#0 0x00002b60acf52dc2 in select () from None
#1 0x00002b60ad1e8784 in time_sleep (self=<value optimized out>, args=<value
optimized out>) from /build/buildd/python2.5-2.5.2/Modules/timemodule.c
/var/launchpad/tmp/eggs/zope.sendmail-3.7.1-py2.5.egg/zope/sendmail/queue.py
(155): run
/usr/lib/python2.5/threading.py (469): __bootstrap_inner
/usr/lib/python2.5/threading.py (461): __bootstrap
Thread 5
#0 0x00002b60ac5e9991 in sem_wait () from None
#1 0x00000000004b371d in PyThread_acquire_lock (lock=0xab3d000, waitflag=1)
from ../Python/thread_pthread.h
#2 0x00000000004b68d0 in lock_PyThread_acquire_lock (self=0x954ca38,
args=<value optimized out>) from ../Modules/threadmodule.c
/usr/lib/python2.5/threading.py (208): wait
/usr/lib/python2.5/Queue.py (158): get
/var/launchpad/tmp/eggs/zope.server-3.6.1-py2.5.egg/zope/server/taskthreads.py
(40): handlerThread
Thread 4
#0 0x00002b60ac5e9991 in sem_wait () from None
#1 0x00000000004b371d in PyThread_acquire_lock (lock=0x938ffe0, waitflag=1) from
../Python/thread_pthread.h
#2 0x00000000004b68d0 in lock_PyThread_acquire_lock (self=0x954cbd0,
args=<value optimized out>) from ../Modules/threadmodule.c
/usr/lib/python2.5/threading.py (208): wait
/usr/lib/python2.5/Queue.py (158): get
/var/launchpad/tmp/eggs/zope.server-3.6.1-py2.5.egg/zope/server/taskthreads.py
(40): handlerThread
Thread 3
#0 0x00002b60ac5e9991 in sem_wait () from None
#1 0x00000000004b371d in PyThread_acquire_lock (lock=0x211f9d0, waitflag=1)
from ../Python/thread_pthread.h
#2 0x00000000004b68d0 in lock_PyThread_acquire_lock (self=0x2693dc8,
args=<value optimized out>) from ../Modules/threadmodule.c
/usr/lib/python2.5/threading.py (208): wait
/usr/lib/python2.5/Queue.py (158): get
/var/launchpad/tmp/eggs/zope.server-3.6.1-py2.5.egg/zope/server/taskthreads.py
(40): handlerThread
Thread 2
#0 0x00002b60ac5e9991 in sem_wait () from None
#1 0x00000000004b371d in PyThread_acquire_lock (lock=0xaaebf40, waitflag=1)
from ../Python/thread_pthread.h
#2 0x00000000004b68d0 in lock_PyThread_acquire_lock (self=0x954c0d8,
args=<value optimized out>) from ../Modules/threadmodule.c
/usr/lib/python2.5/threading.py (208): wait
/usr/lib/python2.5/Queue.py (158): get
/var/launchpad/tmp/eggs/zope.server-3.6.1-py2.5.egg/zope/server/taskthreads.py
(40): handlerThread
Thread 1
#0 0x00002b60acf52dc2 in select () from None
#1 0x00002b60b30942c3 in select_select (self=<value optimized out>,
args=<value optimized out>) from
/build/buildd/python2.5-2.5.2/Modules/selectmodule.c
/usr/lib/python2.5/asyncore.py (104): poll
/var/launchpad/tmp/eggs/zope.app.server-3.4.2-
py2.5.egg/zope/app/server/main.py (80): run
/var/launchpad/tmp/eggs/zope.app.server-3.4.2-
py2.5.egg/zope/app/server/main.py (53): main
/var/launchpad/test/lib/canonical/launchpad/scripts/runlaunchpad.py (264):
start_launchpad
/var/launchpad/test/bin/run (3): <module>
Follow ups