← Back to team overview

launchpad-dev team mailing list archive

Re: Duplicate BuildQueue rows (race condition)

 

Muharem Hrnjadovic wrote:
> Hello Stuart,
> 
> Julian and myself spent quite a bit of time yesterday to analyse bug
> https://bugs.launchpad.net/soyuz/+bug/492632 that was only observed
> after we deployed 3.1.11 in production.
> 
> At this point we're pretty sure that a piece of re-factored build farm
> code introduced a race condition between the buildd-queue-builder.py
> script and the Buildd Manager. This results in Build rows with /two/
> BuildQueue records.
> The code under suspicion is the Build.createBuildQueueEntry() method
> that used to be a simple affair but now inserts rows into 3 tables
> (please see http://pastebin.ubuntu.com/336732/).
> 
> The problem manifested itself in the buildd-retry-depwait.py script when
> Build.buildqueue_record() (http://pastebin.ubuntu.com/337067/) started
> stumbling over one() calls (on storm result sets).
> 
> The question now is how to prevent the race condition from occurring.
> 
> What would the best or most lightweight way of making sure that only
> the queue-builder XOR the buildd manager adds a BuildQueue row to a
> Build?
Hello again,

Stuart looked into this and made some very good suggestions:

 1 - Fix the data model to not allow the duplicates if possible
     - add unique indices on BuildPackageJob.job and
       BuildPackageJob.build (the latter will help us avoid
       duplicate rows in particular).
 2 - Coordinate the separate components so they don't conflict, or
     handle the conflict gracefully.
     - the long transactions of scripts (queue-builder in this case)
       make it difficult to handle failures gracefully
     - however we could use postgres advisory locks (on Build IDs?) [1]
       to coordinate (inside Build.createBuildQueueEntry() ?)
     - in case we do use advisory locks we need to talk to Bjorn for
       a nice interface to them - possibly a utility, maybe tied into
       the transaction machinery, somewhere for people to register the
       ids they use so teams don't conflict

[1]
http://www.postgresql.org/docs/8.3/interactive/explicit-locking.html#ADVISORY-LOCKS

Best regards

-- 
Muharem Hrnjadovic <muharem@xxxxxxxxxx>
Public key id   : B2BBFCFC
Key fingerprint : A5A3 CC67 2B87 D641 103F  5602 219F 6B60 B2BB FCFC

Attachment: signature.asc
Description: OpenPGP digital signature