← Back to team overview

launchpad-dev team mailing list archive

Re: Using a signal to switch to read-only mode

 

(CCing launchpad-dev as others might have ideas/suggestions)

On Tue, 2010-01-05 at 08:13 +0000, Tom Haddon wrote:
> On Mon, 2010-01-04 at 18:16 -0200, Guilherme Salgado wrote:
> > On Mon, 2009-12-21 at 09:21 +0000, Tom Haddon wrote:
> > > On Fri, 2009-12-18 at 10:37 -0500, Gary Poster wrote:
> > > > I like the suggestions I've read.  Thanks to all three of you.  I'll
> > > > summarize the proposals so far.
> > > > 
> > > > - We will switch logrotation to use SIGHUP.
> > > > 
> > > > - We will use SIGUSR2 as a flag for checking for the presence of a
> > > > "read-only.txt" at the top of the tree.
> > > > 
> > > > - At application start, or when SIGUSR2 fires, if "read-only.txt" is
> > > > found at the top of the tree, the application will switch to (or stay
> > > > in) read-only mode.  If it is not found, the application will switch
> > > > to (or stay in) normal read-write mode.
> > > > 
> > > > - We will provide a key-value page to verify the read-only status of
> > > > (each) application.
> > > > 
> > > > Here are my thoughts:
> > > > 
> > > > - I think the key-value page would be very valuable for LOSA peace of
> > > > mind, so I like the idea.  However, it is only pertinent for a given
> > > > application instance.  Going to this page through the load-balancer
> > > > would not be valuable.    LOSAs, would you immediately use this page
> > > > if we offered it, going to each instance in the cluster? 
> > > 
> > > It'd be nice, but I don't want to block on it.
> > > 
> > > >  If not, I'd like to push it out of the scope of this effort, until we
> > > > can think about offering an aggregated view of information like this
> > > > in a dashboard like the one Maris will hopefully be working on this
> > > > cycle.
> > > > 
> > > > - I think we should definitely log mode switches.  Then LOSAs can at
> > > > least trail the logs for a given instance to verify that the app
> > > > noticed the signal and the presence or absence of the file.
> > > 
> > > +1
> > > > 
> > > > - If the LOSAs don't want to rock the boat with changing logrotation
> > > > to SIGHUP, we do have a swath of signals from SIGRTMIN to SIGRTMAX
> > > > that we could use.  I'm in favor of the SIGHUP switch if the LOSAs
> > > > don't mind, though.
> > > > 
> > > This switch is okay.
> > > 
> > 
> > Today I started working on this, and following is my initial plan:
> > 
> >         Currently, the way we switch to read-only is by changing the
> >         read_only config to True *and* changing the main_master and
> >         main_slave configs to point to standalone databases. What we
> >         want is to get rid of the read_only config and collapse the
> >         extra config files we have for read-only mode (lpnet1-db-update)
> >         into the lpnet1 config.
> >         
> >         In order to do this we will use the presence of a file
> >         (read-only.txt) on the root of the tree to identify (upon
> >         startup or SIGUSR2) whether or not we're in read-only mode, and
> >         set the main_master and main_slave configs appropriately.  As
> >         we'll be overwriting these config variables, we'll need to store
> >         all different values we might use for them in new variables
> >         (e.g.  rw_main_master, rw_main_slave, ro_main_master and
> >         ro_main_slave).  (we might even get rid of the main_master and
> >         main_slave config variables as they will be computed values,
> >         which can be moved somewhere else.  although I'm not sure this
> >         is a good idea because all other db names live in config
> >         variables). 
> >         
> >         The plan:
> >         
> >         • Change all places that use config.launchpad.read_only to use
> >           another helper, which tells whether or not we're in read-only
> >           mode by looking for a read-only.txt file.
> >         • switch logrotation to use SIGHUP.
> >         • Rename main_master and main_slave to rw_main_master and
> >           rw_main_slave, adding new (and empty) main_master and main_slave
> >           config variables, which get set upon startup/SIGUSR2 (with the
> >           values of rw_*).
> >         • log read-only/read-write switches 
> > 
> > However, after I started implementing it I realized that having two
> > switches (the read-only.txt file and the SIGUSR2) to turn on read-only
> > doesn't sound like a very good idea (as we may accidentally leave an app
> > server in an inconsistent state), so we may want to use SIGUSR2 to
> > create a read-only.txt file *and* trigger the code that sets the configs
> > with the appropriate values. 
> 
> You don't need to worry about creating/deleting the read-only.txt file -
> we'll manage that through external means (initscripts or other helper
> scripts). I'd envisage you only need one signal which means "check again
> whether we're in read-only or read-write mode". 
> 

As we discussed on IRC, my concern was that having a read-only.txt file
did not mean we were in read-only mode -- the SIGUSR2 is needed, and if
forgotten the server would be in an inconsistent state.  In that state,
the python code thinks we're running in read only (because it relies on
read-only.txt for that) but we're still connecting to the rw db (because
we rely on SIGUSR2 to change to the ro dbs).

Anyway, that didn't seem to be a big deal as this is going to be handled
by scripts, so I went ahead and tried to implement that.  As usual, I've
encountered some problems, and they seem to boil down to the way our
config works -- the config variables are immutable so to make changes we
need to push/pop overlays on top of the existing config.

Since config.pop(name) removes the overlay with the given name and any
others that were on top of it, we can't rely on config.push/pop to
update the config values because we might end up inadvertently reverting
others' changes and others might do the same to ours.  I think this
push/pop mechanism was meant only for testing purposes.

After realizing that I came up with another approach, which relies only
on the presence/absence of the read-only.txt file to figure out the mode
we're on.  On this approach, config.database.main_master/slave are gone
and we use dbconfig.main_master/slave instead, which are properties in
DatabaseConfig that return the appropriate value according to the mode
we're on.

Although that simplifies things for us and for LOSAs, it also means we
can't easily log mode switches (because we don't have the signal
anymore).  We could easily workaround that by pushing config changes,
but I'd be very uncomfortable doing that, for the reasons I explained
above.

So, I'd like to know if this would be an acceptable solution, and
whether or not we can live without logs of the mode switches?

> That make sense?
> 
> >  Similarly, when starting up we'd check for
> > the presence of read-only.txt and set the config variables with the
> > appropriate values.  That means we can't use SIGUSR2 to switch back to
> > read-write mode, though.
> > 
> > An alternative that would not have any of the problems described above
> > would be to keep the existing code using config.launchpad.read_only and
> > have the helper function (which looks for read-only.txt) just update
> > that config variable upon startup/SIGUSR.  That way it'd be much harder
> > to have an appserver in read-only mode using the wrong DB, and we'd be
> > able to use SIGURS2 to switch back to read-write mode.
> > 
> > Any preferences/suggestions?
> > 
> 
> 


-- 
Guilherme Salgado <salgado@xxxxxxxxxxxxx>

Attachment: signature.asc
Description: This is a digitally signed message part


Follow ups