← Back to team overview

yellow team mailing list archive

Re: One more thought...

 

Thank you very much for the replies Stuart.  Comments inline

On 07/05/2012 05:48 AM, Stuart Bishop wrote:
> On Thu, Jul 5, 2012 at 9:35 AM, Gary Poster <gary.poster@xxxxxxxxxxxxx> wrote:
> And more explicitly:
> 
>> (1) Is there a more reliable way to unbreak the seemingly persistent
>> disconnected state, or another way that you'd suggest I try?
> 
> If you look at that assert, you will see how to reset storm directly.
> Get the IZStorm utility, iterate over all stores that exist,
> store.rollback() them or even store.reset() for a bigger hammer.

Benji points out that at least one of the aborts that we have happen
after all the stores have been removed, which will not do what we want
as we understand it.  We'll explore that avenue.

> 
>> (2) Do you see anything in the log blather after the first thing I quote
>> for you below that suggests to you that I (and the traceback) are
>> misidentifying the problem as associated with 504291?
> 
> No, but if you trip over that assert something has already failed
> badly and all bets are off. You need to work out what happened before
> that caused the bogus state. The exceptions are interesting though, as
> that assert resets things. oops-97 puts things back into the 'will
> reconnect' state and oops-98 tells us this reconnect failed because
> there is nothing listening for our connection.

Does that imply that in fact nothing is there?  This happens after we
have verified that the bouncer is accepting connections.  We could
insert some diagnostics on each retry that check some or all of the
following:
 - Is the bouncer pidfile still there?  If so, is the process still running?
 - Is the bouncer still accepting connections on the expected port?
 - Does launchpad have the postgres port we expext for the pgbouncer?

We'll insert those next (more ideas/clarifications welcome ;-) .

Thanks again

Gary


References