← Back to team overview

launchpad-dev team mailing list archive

Re: Does our DB retry code need tweaks for the PG 9.1 upgrade?

 

On 2012-06-04 20:17, Francis J. Lacoste wrote:
On 12-06-01 06:16 PM, Stuart Bishop wrote:

I think there is a Storm bug, although others disagree. I'm not sure
why a socket going tits up is different from any other sort of
disconnection. At the moment, I think when the TCP connection fails
like this Storm doesn't reset the connection so subsequent requests
will also fail (it will probably get an exception it does handle
eventually, so the connection reopening will happen). I can fix this
if I can convince people it is a bug - a few of us on the team have
adjusted this code before as the rarer types of failed connections
have been discovered or changed due to updates.

I'd agree with you. If the normal recovery after that kind of error is
to reconnect and try again, Storm should do this.

It's important to tread carefully though.  Consider:

 1. Start transaction.
 2. Update data in the database.
 3. Lose connection, and implicitly, abort transaction.
 4. Restore connection and start new transaction.
 5. Update more data in the database.
 6. Commit.

In this scenario, you get a paradox where there's no trace of step 2 in the database and yet step 5 has definitely been committed. So recovering connections is not quite as simple as it seems!


Jeroen


Follow ups

References