← Back to team overview

launchpad-dev team mailing list archive

handling of stdout/err/in in subprocesses and test runner hangs / subunit corruption

 

Hi, I just realised that one of the lessons I learnt while working on
parallel testing via lxc hadn't been broadcast - so a little late I'm
doing so :)

I'm not sure where this should be permanently recorded, suggestions welcome ;).

tl;dr:
 - *never* reuse stdin/stdout/stderr from within the test suite
 - except when creating a primary process (one that is generating
subunit itself such as a re-invoked test runner)
 - use Fixtures to communicate child process output to tests.


Long form...

For some test environments we are sensitive to stale processes being
left behind. This can take (at least) two forms - a process that isn't
cleaned up getting in the way of a newer process (e.g. listening on
the port). Secondly, a process that isn't cleaned up will have its
stdout, stderr still open.

Now, *all* uncleaned up processes are bugs in some form or another,
but with the zope testrunner its pretty hard to confidently stomp all
the cases out - and bugs can still happen.

When executing tests over ssh, a stale process with any of the std*
file descriptors shared with the ssh process will cause the ssh
session to hang *even after the test runner exits*. So the existence
of a bug in cleaning up processes will result in hung tests, and thats
'Bad'.

In the near future I hope we'll move buildbot to lxc based parallel
testing, which runs local ssh sessions to control test processes, so
this sort of failure will cause buildbot hangs. Thats 'Real Bad' :).

The other impact that a process with a shared stdout can have is
corrupting the subunit stream: it can write at anypoint and the stream
is able to be thoroughly mangled if this happens.

This sort of thing has probably caused most of us to experience screen
corruption in vim and similar effects.

Currently the test suite has no cases of it that I know of - I did a
cleanup about a month back - but we need to make sure we don't
reintroduce such bugs in future.

-Rob
 - A primary process is one that we will block


Follow ups