launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #04817
Attention: change in layer support code for killing stale processes
We currently have some rather broken code in the support for killing
off of stale external processes.
Its broken because:
- it assumes the system hasn't been rebooted (it may kill totally
unrelated processes)
- its failing horribly (yesterday I had to kill 10 librarian
processes) at its overall task
I think there are bugs related to the layers support in
zope.testrunner/zope.testing too; I will try to narrow those down (but
hey, the more the merrier).
Anyway, to head towards long term sanity I'm landing a branch which
signals more clearly when something has gone wrong - when there is a
pid and no process.
The concrete change you will experience is, if a test run fails to
shutdown a test helper and the pid file is still on disk, is that
tests will refuse to run.
What we'll get for this tradeoff is clearer responsibilities and APi
calls made in bringing up/getting rid of helper instances; this will
let code for bring up instances on demand be simpler and less
convoluted.
I'd really appreciate it if two things were done should you encounter this:
- file a bug (on launchpad-foundations) describing what you had done
where the test helper wasn't shut down.
- delete the pid file after checking that the process really is gone,
this should get you going again.
I'm expecting a few categories:
- bugs in the helpers (librarian, memcache etc).
We must fix these at source: they will affect production
- bugs in the test harness: using the wrong approach to shut things
down / failing to try to shut down.
We should fix this in the harness.
- machine crashes/poweroffs leaving stale stuff to cleanup
We'll fix this long term by having everything in /tmp and unique
environments every time.
For now, it should be sufficiently rare that we don't need to care.
- buildbot, with all its headaches, will probably want a dedicated clean step
this would help with a bunch of stuff we have happen already, but
there is a losa tool that can delete processes in chroots - combining
that with truely unique contexts should be sufficient.
-Rob