yellow team mailing list archive
-
yellow team
-
Mailing list archive
-
Message #00655
Re: Starting a working buildbot juju cluster
On 04/02/12 09:33, Gary Poster wrote:
I got it working, with some hiccups. This documents what I did in case
someone comes along to try it themselves.
1) I use an 8-core set up on ec2. In my ~/.juju/environments.yaml I have
"default: big-ec2" and then big-ec2 looks something like this:
big-ec2:
type: ec2
control-bucket: juju-[UUID goes here]
admin-secret: [secret goes here]
access-key: [key goes here]
secret-key: [secret key goes here]
default-series: precise
juju-origin: ppa
default-instance-type: m2.4xlarge
default-image-id: [64bit ebs image id from
http://uec-images.ubuntu.com/releases/precise/ goes here]
I personally use "python -c 'import uuid; print uuid.uuid4()'" to
generate those uuids, fwiw.
2) The image I had, and the apt sources configured, only had up to lxc
release 45. We need 47 or higher. It turns out I used a beta 1 image;
maybe if I had used a beta 2 image
(http://uec-images.ubuntu.com/releases/precise/beta-2/) it would have
been fixed. I manually changed my apt sources to the sources I use on my
own machine (the official Ubuntu sources) rather than the ec2 version,
and then did an update/upgrade. This gave me lxc version 48. I did this
before setuplxc had a chance to make an lxc, so then the slave started
up fine.
3) "juju expose buildbot-master" didn't work for some reason for me. It
had before. It said it performed the right thing, but then I couldn't
see the web page on 8010. I ended up manually making a change in the AWS
console to the appropriate security group. I didn't know if this was
maybe because of some idiosyncracy of what I had done (the master had an
earlier problem in lpbuildbot--a SyntaxError in the master.cfg--that I
fixed and I'm not mentioning it here because it shouldn't affect the
next person). If the broken expose happens again, we should investigate.
Tests are running now (with --shuffle). I'll report back the results
when I have them (in an hour or so, hopefully!)
Gary
I've had some very weird test runs, as I mentioned on IRC. I decided to
start again with beta 2 and see what that changed.
When doing so, I discovered another fun issue. The slave had an install
error. It turned out this was a bit beyond our control...
2012-04-02 16:10:12,181: hook.output@ERROR: Traceback (most recent call
last):
File "/var/lib/juju/units/buildbot-slave-0/charm/hooks/install", line
83, in <module>
install_packages()
File "/var/lib/juju/units/buildbot-slave-0/charm/hooks/install", line
79, in install_packages
install_extra_repository('ppa:yellow/ppa')
File "/var/lib/juju/units/buildbot-slave-0/charm/hooks/install", line
71, in install_extra_repository
2012-04-02 16:10:12,181: hook.output@ERROR: run('apt-get', 'update')
File "/var/lib/juju/units/buildbot-slave-0/charm/hooks/install", line
31, in run
process.returncode, repr(args), output=stdout+stderr)
subprocess.CalledProcessError: Command '['apt-get', 'update']' returned
non-zero exit status 100
2012-04-02 16:10:12,189: hook.output@DEBUG: hook install exited, exit
code Traceback (most recent call last):
Failure: juju.errors.CharmInvocationError: Error processing
'/var/lib/juju/units/buildbot-slave-0/charm/hooks/install': exit code 1.
.
2012-04-02 16:10:12,189: hook.executor@DEBUG: Hook error:
/var/lib/juju/units/buildbot-slave-0/charm/hooks/install Error
processing '/var/lib/juju/units/buildbot-slave-0/charm/hooks/install':
exit code 1.
2012-04-02 16:10:12,189: statemachine@DEBUG: unitworkflowstate:
executing error transition error_install, Error processing
'/var/lib/juju/units/buildbot-slave-0/charm/hooks/install': exit code 1.
So, running apt-get update gave an error because the apt sources (as
configured in the image itself, AIUI) were pointing to a debian cache in
ec2 that did not exist. Yay.
To work around the problem, I manually changed the /etc/apt/sources.list
to read as follows:
deb http://security.ubuntu.com/ubuntu/ precise-security universe main
deb-src http://security.ubuntu.com/ubuntu/ precise-security universe main
deb http://archive.ubuntu.com/ubuntu precise-updates universe main
deb-src http://archive.ubuntu.com/ubuntu precise-updates universe main
deb http://archive.ubuntu.com/ubuntu precise main universe
deb-src http://archive.ubuntu.com/ubuntu precise main universe
I'm starting to think that we ought to have setuplxc do this as
well--manually overwrite sources.list to the standard values. The only
problem is that this affects the master as well. :-/
Anyway, after doing this I ran "juju resolved --retry buildbot-slave/0"
and eventually the machine was "started". (Note that it never changed
from reporting a broken install until the very end of the
initialization. I followed along the juju logs on the slave to make sure
it was fine.)
As implied above, I had to do the same for the master.
Now I need to figure out why we don't appear to be running any tests. :-/
Gary
Follow ups
References