← Back to team overview

launchpad-dev team mailing list archive

Re: ec2 hangs - hang 1

 

On 04/21/2010 05:19 AM, Max Bowsher wrote:
Thread 2
#0 0x00002b85227fd7fb in accept () from None
#1 0x00002b852388f947 in sock_accept (s=0x94409c0) from
/build/buildd/python2.5-2.5.2/Modules/socketmodule.c
/usr/lib/python2.5/socket.py (167): accept
/usr/lib/python2.5/SocketServer.py (374): get_request
/usr/lib/python2.5/SocketServer.py (216): handle_request
/var/launchpad/tmp/eggs/windmill-1.3beta3_lp_r1440-
py2.5.egg/windmill/server/https.py (394): start
/usr/lib/python2.5/threading.py (445): run
/usr/lib/python2.5/threading.py (469): __bootstrap_inner
/usr/lib/python2.5/threading.py (461): __bootstrap

This must be the culprit of the hang, it appears similar to one I've
been looking at for the Python 2.6 migration. Whatever was supposed to
knock this thread out of its accept loop, hasn't.


Thanks for the great analysis Max.

Windmill implements a custom HTTPS web server, which is waiting for data. I would guess that something in the web browser itself hung: either loading the test harness, loading the site under test, or passing back test results. We need a log file to know for sure.

Perhaps we could put a keepalive switch and a really long timeout somewhere in the stack: either in the windmill server, the windmill client class, or in our own test harness.

Augmenting our own test suite's setup and teardown would probably work best. You can reasonably expect /something/ to happen in the test suite in any given 5-minute interval.


Maris



References