launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #04080
The State of the Soyuz
Build Farm Improvements
=======================
This is currently going well, we have a few initiatives ongoing:
Making build uploads asynchronous.
----------------------------------
Currently as each build is fetched from the slave it will block the buildd-
manager until the upload is finished. For a large upload this can take half a
minute. As more builders are added to the build farm you can see how this
quickly becomes a scaling nightmare! Jelmer is making changes so that the
uploads are dumped in a queue which is processed outside of the manager
process.
Parallel scans of slaves.
-------------------------
Currently, the manager polls each builder in turn on each scan cycle. This is
problematic because it reduces the opportunities for doing more network
operations at the same time and decreases robustness since a failure scanning
one builder will cause the whole scan to be aborted. I've been working on a
branch that separates these scans so each builder is isolated, and it's
currently in testing on the dogfood server.
Better failure detection.
-------------------------
Sometimes we get a bad builder or a bad job, the latter of which can take out
the whole build farm as it hops from builder to builder getting automatically
retried. I've got an experimental branch that works out whether the job or
the builder is at fault by counting failures of both and seeing which one
increases the most over time.
Software Center
===============
Michael has been working hard on this with the ISD guys and has helped to
bootstrap their Software Center Agent project. The Agent is a middle-man that
talks to the billing system, Launchpad and the end user and sets up access on
private PPAs when the end user pays for something. Michael is pretty much
done with his work and is handing over to ISD so they can finish it off.
Derived Distributions
=====================
We've been working on the LEP
(https://dev.launchpad.net/LEP/DerivativeDistributions) for this for a while
now and recently completed the requirements specification at the sprint in
Prague. A couple of things are ongoing:
UI mockups and user testing.
----------------------------
Some initial mockups to capture the basic user interaction are in the LEP, and
these were recently presented to three potential users in some user testing
sessions done by Matthew Revell. This valuable feedback is currently being
analysed and we'll produce updated mockups for a second round of user testing
in the next week.
Moving server scripts to the job system.
----------------------------------------
Steve Kowalik is currently working on moving the initialise-from-parent.py
script into the job system so that we can initiate it from a webapp request
instead of requiring shell access to the servers. This is a pre-requisite to
expanding that script so that it's more flexible and can handle the kind of
derivations that are described in the LEP.
Booby-Traps!
============
Soyuz has a lot of very old bugs that we've traditionally only looked at once
they've caused havoc in production. Francis likes to call these booby-trap
bugs and while we've not had a leg blown off yet, I've pulled shrapnel from
places I don't want to talk about. This approach is not healthy for our
production status, so we're embarking on a way of slotting in fixes for these
while doing our normal feature developments. Hopefully we'll see more long-
term stability from Soyuz by the end of the year.
Thanks for reading this far. If you know what "Soyuz" means in Russian I hope
you found the subject of this email entertaining.
J.
Follow ups