launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #05984
The State of the Soyuz
======================
The State of the Soyuz
======================
Progress on Soyuz can be largely categorised into three items:
1. Feature work
2. Ongoing important bugs (most are tagged 'boobytrap')
3. Firefighting
I shall report on each of these below.
Feature Work
============
We've worked on two main features in the last few months.
Buildd-manager scalability
--------------------------
This feature is largely done barring any last-moment problems. The buildd-
manager has been extensively and invasively re-written to be cleaner, clearer
and most importantly fully asynchronous, which finally allows events from all
the builders to overlap. We also moved the build upload processing to an
external queue so it's not done in a blocking fashion inside the manager
itself.
The result is a lean, mean build farm which is rarely seeing the kind of
massive build queues seen in the past. There's a peak in queue length around
23:30 UTC each day when the daily recipe builds kick off, but these are dealt
with very swiftly now.
Derived distributions
---------------------
Derived distros are still in full swing. Approximately two thirds of the UI
is done (mostly the page that shows the differences between child and parent
series), but more changes are often being identified as necessary. A design
decision in the LEP to simultaneously open and initialise a new distro series
needs to be redesigned because the Ubuntu team wants to do these steps
separately now. We also need to add UI parts to show progress indications of
things like sync operations and diff requests.
The backend for asynchronously initialising a distroseries from a parent is
finished (thanks to Steve's hard work) and can be initiated from the API.
Initiating from the web UI won't be possible until the above redesign is done
and implemented.
The backend for doing sync operations is nearly finished, and Jelmer assures
me it will be done before he absconds to the Bazaar team in January!
In progress is the very complicated code that we need to determine the
differences between two distroseries. This necessitated some changes to Gina
so that we have access to the changelog in the database so it can be probed
for releases that were never separately imported.
Booby Trap Bugs
===============
Any bugs that will cause us to drop everything and brandish fire extinguishers
if they go off are tagged with 'boobytrap'. We've been making fairly slow but
steady progress fixing these (being a man down on the team has not helped).
The main bugs that were fixed are to do with the publisher, which used to hate
uninitialised distroseries (which has enabled Ubuntu to do early opening of
future series), the buildd-manager (which was all part of its re-write), and
package copying. Package copying bugs are a particular annoyance since we've
had a few that have made the publisher completely fail and block all PPAs from
getting publisher.
There are a few more of these in progress now, such as preventing files from
getting re-uploaded once they've been deleted (which has horrible knock-on
effects when people then copy those packages to other PPAs) and some buildd-
manager improvements to tolerate better transient builder/network failures.
Finally, we've got a few publisher performance issues caused by a few
different bugs that end up with superseded/deleted sources that can never be
condemned for removal. We've got a good handle on those and they will be
fixed soon.
Firefighing
===========
Soyuz has had an unfortunate number of production incidents over the last few
months. These were all either buildfarm issues or PPA publisher issues, both
of which are very high profile and high impact.
* 2010-06-17 - PPA publisher complete failure. This was caused by it trying
to write an OOPS file to somewhere it didn't have permission to.
* 2010-08-12 - after the first stage of the buildd-manager re-write, it ended
up not catching EINTR properly which caused the running job to be instantly
failed.
* 2010-10-07 - death row processing (removing condemned files) was failing
and causing many PPAs to go over quota with no way of fixing that. It was
caused by the Postgres 8.4 upgrade causing a particular query to be an order
of magnitude slower.
* 2010-10-28 - failure in the build farm to dispatch any builds, caused in
part by the efforts to re-write the buildd-manager and getting problems that
don't occur in the test environment
* 2010-11-17 - Apache returning "500" error when accessing Private PPAs.
This was caused by the .htaccess files being written with incorrect
permissions.
The Future
==========
Who knows what the future holds, other than goodbye Soyuz team, hello Squad
Red!