launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #07462
Re: reminder: test changed queries on qastaging *especially* for large tables *and* positively id as existing bugs any timeouts
Hi,
I'm the one responsible for this blunder and apologize for it. (And
thanks to William for fixing this while I was sleeping).
1) When you get an OOPS on staging do a thorough analysis. That means
looking at _all_ the OOPS you get, and ensuring that the problem is a
known problem, and that nothing weird related to your changes show up.
In my case, I only look at one of the last OOPS I got which showed no
problem apart from known recalculateBugHeat issue:
OOPS-1998QASTAGING104) But that was the OOPS related to my 3rd attempt.
The first one, OOPS-1998QASTAGING102 (which I didn't investigate) showed
the problem with a cold cache. The new query took 9s in there. (But was
very fast <63ms on the second and third attempts).
2) When "tuning" queries, please leave in comments in the code! There
was not comment here and thought naively that I should get rid of the
extra query to get the archive ids and use a join instead. Bad bad idea
it seemed. A comment explaining this non-intuitive query would have
saved me re-learning that already learned lesson :-)
On 11-06-21 10:22 PM, Robert Collins wrote:
> We are currently dealing with bug 800485 where validation of
> sourcepackagenames has gone from 80ms to 1800ms(hot) or minutes
> (cold).
>
> This was caused when a patch changed a non-storm query to a storm
> query *and* added a single join table in (rather than the substituted
> archive ids).
>
> Most of our queries are now tuned; postgresql consistently chooses bad
> plans on the 'obvious' way to write things for many of our very large,
> or very skewed data sets.
>
> As a result, whenever you change a query on a big table - where big
> means > 20K rows - its important to try and exercise it on qastaging.
>
> If the thing you are testing times out, its *vital* that the timeout
> be positively identified as a pre-existing condition before assuming
> qastaging is slow[1].
>
> In this particular case, the patch was qa'd, but an existing timeout
> bug was assumed to be the cause of qa timeouts: we should have grabbed
> the oops and positively id'd the timeout as the existing bug - that
> would have told us about the regression and let us avoid the crisis.
>
> 1) how slow is qastaging? Its not, not really. It has enough memory on
> the DB server to page into hot cache the working set for any one page
> in the system: you may need to try a lot of times to seed the cache,
> but *everything* *can* work on qastaging.
>
>
> -Rob
>
> _______________________________________________
> Mailing list: https://launchpad.net/~launchpad-dev
> Post to : launchpad-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~launchpad-dev
> More help : https://help.launchpad.net/ListHelp
--
Francis J. Lacoste
francis.lacoste@xxxxxxxxxxxxx
Attachment:
signature.asc
Description: OpenPGP digital signature
Follow ups
References