← Back to team overview

yellow team mailing list archive

Odd ec2 behavior

 

Suggested soundtrack: http://www.youtube.com/watch?v=ivWfRQ4V5ao

Today Gary and I both ran into the same problem with running parallel tests on ec2.  The symptom is that /var/tmp/bazaar.launchpad.dev is being created by the 'make build' step and it is owned by root with group/other write access.  Tests that need to write to that directory or children fail with permission denied errors.

When run locally, 'make clean' will remove /var/tmp/bazaar.launchpad.dev and 'make build' does not create it.  That's great because any codehosting tests that need the directory will create it on demand and we don't see any problems.

Also, when inside an lxc container (from the slave) running 'make clean build' works as expected too.  Hurrah.

But the 'make build' step, when run from /usr/local/bin/lp-setup-lxc-build, results in /var/tmp/bazaar.launchpad.dev being created with the wrong ownership.  It is unclear why 'make build' creates that directory.

Findings:

1) The LP Makefile has not changed recently to cause this problem,

2) Gary and I both changed lp-setup-lxc-build to run each step individually and print out 'ls /var/tmp' between.  The modified script is at http://paste.ubuntu.com/1003875/ and the results are at http://pastebin.ubuntu.com/1003905/

The results are a little hard to read because the commands were not echoed.  At line 121 'make clean' has just completed and 'ls /var/tmp' executed on the container.  It shows there is no bazaar.launchpad.dev.  'make build' is then called.

It completes at line 177 and then 'ls /var/tmp' is run and shows:  

drwxr-xr-x  2 buildbot buildbot 4096 May 23 19:24 archive
drwxr-xr-x  2 root     root     4096 May 23 23:04 bazaar.launchpad.dev
drwxr-xr-x  5 buildbot buildbot 4096 May 23 23:06 launchpad_mailqueue
drwxrwsr-x 10 buildbot buildbot 4096 May 23 23:06 mailman

So, 'make build' has created the directory and it is owned by root and unwritable.  

We cannot figure out why this is happening and why it just started.  Parallel tests on ec2 are stuck until we fix it.

--Brad



Follow ups