← Back to team overview

yellow team mailing list archive

Re: Help request for intermittently failing test

 

On Tue, Jun 26, 2012 at 3:11 AM, Gary Poster <gary.poster@xxxxxxxxxxxxx> wrote:
> Hi Stuart, Robert.
>
> We have two problems that seem like they ought to be solvable by increasing
> timeouts, but apparently are not.  This email is about one of them.  I might
> write you about the other one too the next time it pops up.
>
> The most recent 20.2 MB test failure message can be obtained here:
> http://ubuntuone.com/0kW9S8M5SGM5JqZ9HPBHfj

My initial thought is if you can get a 20MB failure message, making
this sane is the first problem to fix. I think we have the first OOPS,
and 999 pointless ones.

> To Stuart, and maybe Robert: do you see any clues in those tracebacks and
> database messages to give us an idea about what to look for? The fact that
> things really are supposed to be failing for part of that time really
> confuses me.  Admittedly, none of us have dug in on this problem yet.
> Speaking for myself, I'm afraid a bit of irrational

I agree something isn't starting up as soon as we would like it too.
The pgbouncer test fixture at one point handled this better by waiting
until it could open a socket. This was r6 of python-pgbouncer, but
this socket check was rolled back for some unrelated reason I no
longer recall - possibly robert had a different fix he preferred? The
current version of the fixture certainly has a race condition between
when the .pid file is written and when the socket is bound.


> To Robert, and maybe Stuart: I've considered simply ripping this test file
> out.  I feel it would be at least a mild shame, because it really does
> verify something important.  OTOH, it's arguably doing horrible things, far
> into integration test land.  What do you think of deleting the file
> entirely?

This is testing the behaviour we rely on for fast downtime deploys,
and IIRC is actually a bug fix because we used to report 500 errors
instead of 503 SERVICE UNAVAILABLE. We should keep this, as psycopg2,
postgresql and even pgbouncer updates could break this behaviour.



-- 
Stuart Bishop <stuart.bishop@xxxxxxxxxxxxx>


References