← Back to team overview

launchpad-dev team mailing list archive

Connections left open after switching to read only mode

 

Before any roll out that involves changes to the database schema, we
switch Launchpad into read only mode and point it to a read-only replica
of our DB while we do the roll out.  Switching to read only should be
just a matter of placing a file named read-only.txt under the root of
our tree and waiting a couple minutes for all threads to finish
processing their requests and notice the mode switch, but that never
worked as expected[1].

This works by having the DB URI as a config variable that returns the
appropriate value according to the presence of the read-only.txt file,
together with a publication hook that forces the storm stores to
reconnect whenever there's a mode switch.

However, storm (through ZStorm) shares storm.database.Database instances
across threads (without saying so in the docs) and our custom version of
that class (LaunchpadDatabase) may change its DB URI (when there's a
config change) after instantiation[2]. This, in itself, is a problem,
but since we don't do runtime config changes except when running the
test suite (which is single-threaded), it's never affected us.

My changes for the read-only switch, though, introduced a way of
changing config values at runtime, thus exposing the problem.

This means there's a race condition when switching to read-only mode,
when a thread kicks in after another thread has noticed the config
change and reconnected its stores, thus causing the shared instance of
LaunchpadDatabase to change its DB URI.  When other threads kick in,
they'll think they're connected to the correct DB (remember we use the
DB URI stored in LaunchpadDatabase for that) and won't tell their stores
to reconnect, leaving open connections to DBs that should have no
connections.

I think we should try and find out why storm shares these instances
across threads and if it's not really intentional/needed, we should fix
it.  If it's needed, they should at least note in the class' docs that
it's shared across threads, as that class is expected to be customized
in applications.

In the meantime, we can easily fix this by not using the DB URI stored
in LaunchpadDatabase to detect mode switches, using the class of the
connection's instance instead -- when in read-only mode we use a
ReadOnlyModeConnection whereas in read-write we use the regular
PostgresConnection (provided by storm).

[1] https://bugs.edge.launchpad.net/launchpad-foundations/+bug/531834
[2] IIRC, this was done to support LaunchpadZopelessLayer.switchDBUser()

-- 
Guilherme Salgado <https://launchpad.net/~salgado>

Attachment: signature.asc
Description: This is a digitally signed message part


Follow ups