← Back to team overview

launchpad-dev team mailing list archive

Build farm and the slave build id menagerie

 

I've just been discussing something with wgrant that has been bothering both of us.

The build farm puts serious complexity into having unique "slave build ids" (which are basically the same as "buildfarm job names"). Theoretically arbitrary, they are produced in different ways for different job types, and then for each job type there's a method to check that the build ids cited by the slaves match what the master thought the slaves were working on.

The only constant between these slave build ids is that they all contain a BuildQueue id. And that's enough to guarantee uniqueness (though AFAIK even that isn't really needed). They also all contain a "something else" that can be cross-checked against it: a Build.id, a BuildBase.id (not the same!), a Branch.name. That's where all the complexity goes.

We can't be sure, but we think the cross-check may have started out as an extra protection against compromised slaves trying to confuse the buildd master. If it is, the ids are too predictable to offer much protection (and I'm told the worst the attacker could achieve is hold up the recovery of a hung slave). Or maybe it's just a belt-and-suspenders check against accidental matches, but then making it simpler would be much better protection.

So we propose simplifying the whole thing as follows:

1. The slave build id is concocted in a single place, and completely generic between build farm job types.

2. Likewise, we verify the slave build id in a single place and with no variations for different job types.

3. We pass the ready-made slave build id to dispatchBuildToSlave. There's no need for each implementation to repeat the code to generate it.

4. The slave build id uses the BuildQueue id for uniqueness, plus optionally a hard-to-predict cookie to thwart compromised slaves. We may even want to combine the two into a single hash; see below.

5. If we do want a cookie for security, we use generically available values that are tightly associated with the slave build but not all predictable in the same way: Job.date_created, BuildQueue.builder, Job.requester. If we hash the lot together, a compromised slave won't receive any of the component values for its own job as starting points for a guess.

6. We come up with a better or at least consistent name for these.

7. We forget about the whole thing & live merrily ever after.

If we ever decide that we need seriously unpredictable ids, the hash I suggested is an improvement but still not exactly safe. If desired we can throw in a new column BuildQueue.salt later, optional at first, to get a better hash trapdoor without breaking compatibility with pending jobs. Who wouldn't want cookies with salt in them?

Then again, maybe we don't need a cookie at all and that would be even easier.


Any comments?  Jeers?  Cheers?  Beers..?


Jeroen



Follow ups