launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #02975
Build farm and the slave build id menagerie
I've just been discussing something with wgrant that has been bothering
both of us.
The build farm puts serious complexity into having unique "slave build
ids" (which are basically the same as "buildfarm job names").
Theoretically arbitrary, they are produced in different ways for
different job types, and then for each job type there's a method to
check that the build ids cited by the slaves match what the master
thought the slaves were working on.
The only constant between these slave build ids is that they all contain
a BuildQueue id. And that's enough to guarantee uniqueness (though
AFAIK even that isn't really needed). They also all contain a
"something else" that can be cross-checked against it: a Build.id, a
BuildBase.id (not the same!), a Branch.name. That's where all the
complexity goes.
We can't be sure, but we think the cross-check may have started out as
an extra protection against compromised slaves trying to confuse the
buildd master. If it is, the ids are too predictable to offer much
protection (and I'm told the worst the attacker could achieve is hold up
the recovery of a hung slave). Or maybe it's just a belt-and-suspenders
check against accidental matches, but then making it simpler would be
much better protection.
So we propose simplifying the whole thing as follows:
1. The slave build id is concocted in a single place, and completely
generic between build farm job types.
2. Likewise, we verify the slave build id in a single place and with no
variations for different job types.
3. We pass the ready-made slave build id to dispatchBuildToSlave.
There's no need for each implementation to repeat the code to generate it.
4. The slave build id uses the BuildQueue id for uniqueness, plus
optionally a hard-to-predict cookie to thwart compromised slaves. We
may even want to combine the two into a single hash; see below.
5. If we do want a cookie for security, we use generically available
values that are tightly associated with the slave build but not all
predictable in the same way: Job.date_created, BuildQueue.builder,
Job.requester. If we hash the lot together, a compromised slave won't
receive any of the component values for its own job as starting points
for a guess.
6. We come up with a better or at least consistent name for these.
7. We forget about the whole thing & live merrily ever after.
If we ever decide that we need seriously unpredictable ids, the hash I
suggested is an improvement but still not exactly safe. If desired we
can throw in a new column BuildQueue.salt later, optional at first, to
get a better hash trapdoor without breaking compatibility with pending
jobs. Who wouldn't want cookies with salt in them?
Then again, maybe we don't need a cookie at all and that would be even
easier.
Any comments? Jeers? Cheers? Beers..?
Jeroen
Follow ups