← Back to team overview

launchpad-dev team mailing list archive

Re: Does our DB retry code need tweaks for the PG 9.1 upgrade?

 

Hi Stuart,

Ok, thanks for the clarifications. Do you have time to handle this bug,
or should I ask the maintenance team to have a go at it?

Do we have a RT for the pgbouncer upgrade yet?

Cheers

On 12-05-31 04:29 PM, Stuart Bishop wrote:
> On Thu, May 31, 2012 at 10:04 PM, Francis J. Lacoste
> <francis.lacoste@xxxxxxxxxxxxx> wrote:
>> Hi Stuart,
>>
>> We have a cluster of recent bugs that seems to hint that the retry
>> transaction code might need some tweaking since our upgrade to PG 9.1.
>>
>> https://bugs.launchpad.net/launchpad/+bug/1000805
>>
>> That first one is a
>>
>> psycopg2.OperationalError: could not send data to server: Connection
>> timed out
>>
>> when serving private attachments from the librarian. Usually, attempting
>> again will work. Is that a new error in PG 9.1 that we should add to the
>> retry list? It only re-attempts DisconnectionError, IntegrityError and
>> TransactionalRollbackError.
> 
> Its not PG 9.1 - this is entirely client side. The trigger was likely
> psycopg2 2.4 or libpq5, both of which needed to be upgraded before the
> PG 9.1 upgrade. I've updated the bug report - Storm needs to catch
> these exceptions so connections get reopened, and it will reraise them
> as a DisconnectionError IIRC.
> 
> It might also be new because our sockets were not failing like this
> before. We really shouldn't be losing sockets like this - perhaps a
> pg_bouncer upgrade is in order? I think the relevant connection limit
> in pg_bouncer was set to 20 connections and was recently bumped to 40.
> 
> 
>> https://bugs.launchpad.net/launchpad/+bug/1006530
>> https://bugs.launchpad.net/launchpad/+bug/1006531
>>
>> These two are OOPSes triggered during fastdowntime. I was under the
>> impression that we weren't logging those during fastdowntime and thus
>> our filters might need updating. Or maybe, I'm mistaken and it's just
>> that Diogo is our normal filter here, and since he's on leave this it
>> explains why Laura reported bugs about those.
>>
>> Thanks for your insights.
> 
> We log OOPSes during fastdowntime, because fastdowntime looks exactly
> like a database outage from the client side and we want to know about
> database outages. I'm not sure what filtering was being done to hide
> them from the reports. We should report these failures if they happen
> outside of the scheduled fastdowntime window.
> 
> 
> 

-- 
Francis J. Lacoste
francis.lacoste@xxxxxxxxxxxxx

Attachment: signature.asc
Description: OpenPGP digital signature


Follow ups

References