← Back to team overview

yellow team mailing list archive

Re: Starting a working buildbot juju cluster

 

On 04/02/12 09:33, Gary Poster wrote:
I got it working, with some hiccups. This documents what I did in case
someone comes along to try it themselves.

1) I use an 8-core set up on ec2. In my ~/.juju/environments.yaml I have
"default: big-ec2" and then big-ec2 looks something like this:

big-ec2:
type: ec2
control-bucket: juju-[UUID goes here]
admin-secret: [secret goes here]
access-key: [key goes here]
secret-key: [secret key goes here]
default-series: precise
juju-origin: ppa
default-instance-type: m2.4xlarge
default-image-id: [64bit ebs image id from
http://uec-images.ubuntu.com/releases/precise/ goes here]

I personally use "python -c 'import uuid; print uuid.uuid4()'" to
generate those uuids, fwiw.

2) The image I had, and the apt sources configured, only had up to lxc
release 45. We need 47 or higher. It turns out I used a beta 1 image;
maybe if I had used a beta 2 image
(http://uec-images.ubuntu.com/releases/precise/beta-2/) it would have
been fixed. I manually changed my apt sources to the sources I use on my
own machine (the official Ubuntu sources) rather than the ec2 version,
and then did an update/upgrade. This gave me lxc version 48. I did this
before setuplxc had a chance to make an lxc, so then the slave started
up fine.

3) "juju expose buildbot-master" didn't work for some reason for me. It
had before. It said it performed the right thing, but then I couldn't
see the web page on 8010. I ended up manually making a change in the AWS
console to the appropriate security group. I didn't know if this was
maybe because of some idiosyncracy of what I had done (the master had an
earlier problem in lpbuildbot--a SyntaxError in the master.cfg--that I
fixed and I'm not mentioning it here because it shouldn't affect the
next person). If the broken expose happens again, we should investigate.

Tests are running now (with --shuffle). I'll report back the results
when I have them (in an hour or so, hopefully!)

Gary


I've had some very weird test runs, as I mentioned on IRC. I decided to start again with beta 2 and see what that changed.

When doing so, I discovered another fun issue. The slave had an install error. It turned out this was a bit beyond our control...

2012-04-02 16:10:12,181: hook.output@ERROR: Traceback (most recent call last): File "/var/lib/juju/units/buildbot-slave-0/charm/hooks/install", line 83, in <module>
    install_packages()
File "/var/lib/juju/units/buildbot-slave-0/charm/hooks/install", line 79, in install_packages
    install_extra_repository('ppa:yellow/ppa')
File "/var/lib/juju/units/buildbot-slave-0/charm/hooks/install", line 71, in install_extra_repository

2012-04-02 16:10:12,181: hook.output@ERROR:     run('apt-get', 'update')
File "/var/lib/juju/units/buildbot-slave-0/charm/hooks/install", line 31, in run
    process.returncode, repr(args), output=stdout+stderr)
subprocess.CalledProcessError: Command '['apt-get', 'update']' returned non-zero exit status 100

2012-04-02 16:10:12,189: hook.output@DEBUG: hook install exited, exit code Traceback (most recent call last): Failure: juju.errors.CharmInvocationError: Error processing '/var/lib/juju/units/buildbot-slave-0/charm/hooks/install': exit code 1.
.
2012-04-02 16:10:12,189: hook.executor@DEBUG: Hook error: /var/lib/juju/units/buildbot-slave-0/charm/hooks/install Error processing '/var/lib/juju/units/buildbot-slave-0/charm/hooks/install': exit code 1. 2012-04-02 16:10:12,189: statemachine@DEBUG: unitworkflowstate: executing error transition error_install, Error processing '/var/lib/juju/units/buildbot-slave-0/charm/hooks/install': exit code 1.

So, running apt-get update gave an error because the apt sources (as configured in the image itself, AIUI) were pointing to a debian cache in ec2 that did not exist. Yay.

To work around the problem, I manually changed the /etc/apt/sources.list to read as follows:

deb http://security.ubuntu.com/ubuntu/ precise-security universe main
deb-src http://security.ubuntu.com/ubuntu/ precise-security universe main
deb http://archive.ubuntu.com/ubuntu precise-updates universe main
deb-src http://archive.ubuntu.com/ubuntu precise-updates universe main
deb http://archive.ubuntu.com/ubuntu precise main universe
deb-src http://archive.ubuntu.com/ubuntu precise main universe

I'm starting to think that we ought to have setuplxc do this as well--manually overwrite sources.list to the standard values. The only problem is that this affects the master as well. :-/

Anyway, after doing this I ran "juju resolved --retry buildbot-slave/0" and eventually the machine was "started". (Note that it never changed from reporting a broken install until the very end of the initialization. I followed along the juju logs on the slave to make sure it was fine.)

As implied above, I had to do the same for the master.

Now I need to figure out why we don't appear to be running any tests. :-/


Gary


Follow ups

References