← Back to team overview

launchpad-dev team mailing list archive

Re: Using a signal to switch to read-only mode

 

On Tue, 2010-01-05 at 18:54 -0200, Guilherme Salgado wrote:
> (CCing launchpad-dev as others might have ideas/suggestions)
> 
> On Tue, 2010-01-05 at 08:13 +0000, Tom Haddon wrote:
> > On Mon, 2010-01-04 at 18:16 -0200, Guilherme Salgado wrote:
> > > On Mon, 2009-12-21 at 09:21 +0000, Tom Haddon wrote:
> > > > On Fri, 2009-12-18 at 10:37 -0500, Gary Poster wrote:
> > > > > I like the suggestions I've read.  Thanks to all three of you.  I'll
> > > > > summarize the proposals so far.
> > > > > 
> > > > > - We will switch logrotation to use SIGHUP.
> > > > > 
> > > > > - We will use SIGUSR2 as a flag for checking for the presence of a
> > > > > "read-only.txt" at the top of the tree.
> > > > > 
> > > > > - At application start, or when SIGUSR2 fires, if "read-only.txt" is
> > > > > found at the top of the tree, the application will switch to (or stay
> > > > > in) read-only mode.  If it is not found, the application will switch
> > > > > to (or stay in) normal read-write mode.
> > > > > 
> > > > > - We will provide a key-value page to verify the read-only status of
> > > > > (each) application.
> > > > > 
> > > > > Here are my thoughts:
> > > > > 
> > > > > - I think the key-value page would be very valuable for LOSA peace of
> > > > > mind, so I like the idea.  However, it is only pertinent for a given
> > > > > application instance.  Going to this page through the load-balancer
> > > > > would not be valuable.    LOSAs, would you immediately use this page
> > > > > if we offered it, going to each instance in the cluster? 
> > > > 
> > > > It'd be nice, but I don't want to block on it.
> > > > 
> > > > >  If not, I'd like to push it out of the scope of this effort, until we
> > > > > can think about offering an aggregated view of information like this
> > > > > in a dashboard like the one Maris will hopefully be working on this
> > > > > cycle.
> > > > > 
> > > > > - I think we should definitely log mode switches.  Then LOSAs can at
> > > > > least trail the logs for a given instance to verify that the app
> > > > > noticed the signal and the presence or absence of the file.
> > > > 
> > > > +1
> > > > > 
> > > > > - If the LOSAs don't want to rock the boat with changing logrotation
> > > > > to SIGHUP, we do have a swath of signals from SIGRTMIN to SIGRTMAX
> > > > > that we could use.  I'm in favor of the SIGHUP switch if the LOSAs
> > > > > don't mind, though.
> > > > > 
> > > > This switch is okay.
> > > > 
> > > 
> > > Today I started working on this, and following is my initial plan:
> > > 
> > >         Currently, the way we switch to read-only is by changing the
> > >         read_only config to True *and* changing the main_master and
> > >         main_slave configs to point to standalone databases. What we
> > >         want is to get rid of the read_only config and collapse the
> > >         extra config files we have for read-only mode (lpnet1-db-update)
> > >         into the lpnet1 config.
> > >         
> > >         In order to do this we will use the presence of a file
> > >         (read-only.txt) on the root of the tree to identify (upon
> > >         startup or SIGUSR2) whether or not we're in read-only mode, and
> > >         set the main_master and main_slave configs appropriately.  As
> > >         we'll be overwriting these config variables, we'll need to store
> > >         all different values we might use for them in new variables
> > >         (e.g.  rw_main_master, rw_main_slave, ro_main_master and
> > >         ro_main_slave).  (we might even get rid of the main_master and
> > >         main_slave config variables as they will be computed values,
> > >         which can be moved somewhere else.  although I'm not sure this
> > >         is a good idea because all other db names live in config
> > >         variables). 
> > >         
> > >         The plan:
> > >         
> > >         • Change all places that use config.launchpad.read_only to use
> > >           another helper, which tells whether or not we're in read-only
> > >           mode by looking for a read-only.txt file.
> > >         • switch logrotation to use SIGHUP.
> > >         • Rename main_master and main_slave to rw_main_master and
> > >           rw_main_slave, adding new (and empty) main_master and main_slave
> > >           config variables, which get set upon startup/SIGUSR2 (with the
> > >           values of rw_*).
> > >         • log read-only/read-write switches 
> > > 
> > > However, after I started implementing it I realized that having two
> > > switches (the read-only.txt file and the SIGUSR2) to turn on read-only
> > > doesn't sound like a very good idea (as we may accidentally leave an app
> > > server in an inconsistent state), so we may want to use SIGUSR2 to
> > > create a read-only.txt file *and* trigger the code that sets the configs
> > > with the appropriate values. 
> > 
> > You don't need to worry about creating/deleting the read-only.txt file -
> > we'll manage that through external means (initscripts or other helper
> > scripts). I'd envisage you only need one signal which means "check again
> > whether we're in read-only or read-write mode". 
> > 
> 
> As we discussed on IRC, my concern was that having a read-only.txt file
> did not mean we were in read-only mode -- the SIGUSR2 is needed, and if
> forgotten the server would be in an inconsistent state.  In that state,
> the python code thinks we're running in read only (because it relies on
> read-only.txt for that) but we're still connecting to the rw db (because
> we rely on SIGUSR2 to change to the ro dbs).
> 
> Anyway, that didn't seem to be a big deal as this is going to be handled
> by scripts, so I went ahead and tried to implement that.  As usual, I've
> encountered some problems, and they seem to boil down to the way our
> config works -- the config variables are immutable so to make changes we
> need to push/pop overlays on top of the existing config.
> 
> Since config.pop(name) removes the overlay with the given name and any
> others that were on top of it, we can't rely on config.push/pop to
> update the config values because we might end up inadvertently reverting
> others' changes and others might do the same to ours.  I think this
> push/pop mechanism was meant only for testing purposes.
> 
> After realizing that I came up with another approach, which relies only
> on the presence/absence of the read-only.txt file to figure out the mode
> we're on.  On this approach, config.database.main_master/slave are gone
> and we use dbconfig.main_master/slave instead, which are properties in
> DatabaseConfig that return the appropriate value according to the mode
> we're on.

Does this mean we're checking for the presence of this text file before
every database operation? That sounds quite IO intensive.

> Although that simplifies things for us and for LOSAs, it also means we
> can't easily log mode switches (because we don't have the signal
> anymore). 

Surely the server knows a current state, and then if that changes you
could log it?

>  We could easily workaround that by pushing config changes,
> but I'd be very uncomfortable doing that, for the reasons I explained
> above.
> 
> So, I'd like to know if this would be an acceptable solution, and
> whether or not we can live without logs of the mode switches?
> 
> > That make sense?
> > 
> > >  Similarly, when starting up we'd check for
> > > the presence of read-only.txt and set the config variables with the
> > > appropriate values.  That means we can't use SIGUSR2 to switch back to
> > > read-write mode, though.
> > > 
> > > An alternative that would not have any of the problems described above
> > > would be to keep the existing code using config.launchpad.read_only and
> > > have the helper function (which looks for read-only.txt) just update
> > > that config variable upon startup/SIGUSR.  That way it'd be much harder
> > > to have an appserver in read-only mode using the wrong DB, and we'd be
> > > able to use SIGURS2 to switch back to read-write mode.
> > > 
> > > Any preferences/suggestions?
> > > 
> > 
> > 
> 
> 





Follow ups

References