launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #01993
Re: Duplicate BuildQueue rows (race condition)
Muharem Hrnjadovic wrote:
> Hello Stuart,
>
> Julian and myself spent quite a bit of time yesterday to analyse bug
> https://bugs.launchpad.net/soyuz/+bug/492632 that was only observed
> after we deployed 3.1.11 in production.
>
> At this point we're pretty sure that a piece of re-factored build farm
> code introduced a race condition between the buildd-queue-builder.py
> script and the Buildd Manager. This results in Build rows with /two/
> BuildQueue records.
> The code under suspicion is the Build.createBuildQueueEntry() method
> that used to be a simple affair but now inserts rows into 3 tables
> (please see http://pastebin.ubuntu.com/336732/).
>
> The problem manifested itself in the buildd-retry-depwait.py script when
> Build.buildqueue_record() (http://pastebin.ubuntu.com/337067/) started
> stumbling over one() calls (on storm result sets).
>
> The question now is how to prevent the race condition from occurring.
>
> What would the best or most lightweight way of making sure that only
> the queue-builder XOR the buildd manager adds a BuildQueue row to a
> Build?
Hello again,
Stuart looked into this and made some very good suggestions:
1 - Fix the data model to not allow the duplicates if possible
- add unique indices on BuildPackageJob.job and
BuildPackageJob.build (the latter will help us avoid
duplicate rows in particular).
2 - Coordinate the separate components so they don't conflict, or
handle the conflict gracefully.
- the long transactions of scripts (queue-builder in this case)
make it difficult to handle failures gracefully
- however we could use postgres advisory locks (on Build IDs?) [1]
to coordinate (inside Build.createBuildQueueEntry() ?)
- in case we do use advisory locks we need to talk to Bjorn for
a nice interface to them - possibly a utility, maybe tied into
the transaction machinery, somewhere for people to register the
ids they use so teams don't conflict
[1]
http://www.postgresql.org/docs/8.3/interactive/explicit-locking.html#ADVISORY-LOCKS
Best regards
--
Muharem Hrnjadovic <muharem@xxxxxxxxxx>
Public key id : B2BBFCFC
Key fingerprint : A5A3 CC67 2B87 D641 103F 5602 219F 6B60 B2BB FCFC
Attachment:
signature.asc
Description: OpenPGP digital signature