← Back to team overview

launchpad-dev team mailing list archive

Re: ec2 test failures

 

On 17 November 2011 14:20, Gary Poster <gary.poster@xxxxxxxxxxxxx> wrote:
>
>
> On Nov 16, 2011, at 8:58 PM, Martin Pool <mbp@xxxxxxxxxxxxxx> wrote:
>
>> I filed this yesterday: <https://bugs.launchpad.net/launchpad/+bug/891028>
>>
>> The way getUniqueInteger is implemented using both only per-thread
>> uniqueness, and also counting on pseudorandom integers to be unique
>> looks pretty suspicious.
>>
>> It's interesting that it would now be failing consistently, and only
>> on ec2.  bac did hit this much earlier this year.
>
> The new aspect of the error state is that  this was no longer intermittent, and that it had *exactly* the same integer reliably, across ec2 and buldbot.
>
> I agree with your analysis that the current code should cause intermittent collisions. Reliable collisions on the same value across machines is more mysterious.

Not so mysterious.  The various counters are all created from the same
Python-wide prng, so if is set to a particular state before this code
is reached, it will always return the same values and collide at the
same point.

Why would it return the same values?  Well, there are several tests or
functions reached from tests that reset the random seed, for instance
test_token_creation just sets it flat out to zero.

so, if you happen to do just the right number of calls to the prng
before reaching this code, it will always fail.

Using pseudorandom values in tests in the hope they will be unique is
not a good idea.

-- 
Martin


Follow ups

References