← Back to team overview

launchpad-dev team mailing list archive

Re: Failures in ec2

 

On Thu, Oct 14, 2010 at 12:25 PM, Steve Kowalik
<steve.kowalik@xxxxxxxxxxxxx> wrote:
> Hi guys,
>
>        I seem to constantly get thread-based failures when submitting a branch
> to ec2, or when Hudson performs a build. I got sick enough of it today
> to actually sit down and talk to Robert and Maris about it, and did a
> little bit of debugging.
>
>        It does seem like certain tests will leave a thread hanging around,
> which then zope gets caught up in.
>
> test:
> lp.codehosting.puller.tests.test_worker.TestWorkerProgressReporting.test_n
> etwork
> Thread Name: MainThread
> Is Daemon?: False
> Thread target: None
>
> Thread Name: Thread-18
> Is Daemon?: True
> Thread target: <bound method HttpServer._http_start of
> HttpServer(127.0.0.1:3711
> 1)>
>
> Thread Name: Thread-20
> Is Daemon?: 1
> Thread target: <bound method
> TestingThreadingHTTPServer.process_request_thread o
> f <bzrlib.tests.http_server.TestingThreadingHTTPServer instance at
> 0x6e78128>>
>
> time: 2010-10-14 10:53:44.596568Z
> successful:
> lp.codehosting.puller.tests.test_worker.TestWorkerProgressReporting.test_network
> test:
> lp.codehosting.puller.tests.test_worker.TestWorkerProgressReporting.test_network
> tags: zope:threads
> error:
> lp.codehosting.puller.tests.test_worker.TestWorkerProgressReporting.test_network
> [ multipart
> Content-Type: text/plain;charset=utf8
> garbage
> 34
> [<Thread(Thread-18, started daemon 47971215480592)>]0
> ]
> time: 2010-10-14 10:53:44.596847Z
>
> So it looks like the HttpServer instance needs to be killed in the test
> or in the teardown? I'm at a little bit of a loss, personally, so
> thought I'd throw it out there first.
>

This seems a lot like https://bugs.edge.launchpad.net/bzr/+bug/193253,
although there it's a socket leaking check rather than a thread
leaking check. I don't know what's caused it to regress.

Specifically, there's code hidden by bzrlib that isn't cleaning up
after itself. Whether it should or not is an open question. From one
point of view, our thread checker is being overzealous, catching a
leak in something that's never going to affect production. From
another point of view, HttpServer.stop_server() should darn well stop
the server.

Anyway, fixes are:
  * Fix bzrlib.tests.http_server to clean up its thread in stop_server
  * Find some way of getting the thread leaking checker to ignore the thread

Perhaps there are more fundamental issues that could be address. Them,
I leave to Rob.

CCing vila because of the history.

jml



Follow ups

References