launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #02149
Re: Using a signal to switch to read-only mode
On Tue, 2010-01-05 at 21:12 +0000, Tom Haddon wrote:
> On Tue, 2010-01-05 at 18:54 -0200, Guilherme Salgado wrote:
> > (CCing launchpad-dev as others might have ideas/suggestions)
> >
> > On Tue, 2010-01-05 at 08:13 +0000, Tom Haddon wrote:
> > > On Mon, 2010-01-04 at 18:16 -0200, Guilherme Salgado wrote:
> > > > On Mon, 2009-12-21 at 09:21 +0000, Tom Haddon wrote:
> > > > > On Fri, 2009-12-18 at 10:37 -0500, Gary Poster wrote:
> > > > > > I like the suggestions I've read. Thanks to all three of you. I'll
> > > > > > summarize the proposals so far.
> > > > > >
> > > > > > - We will switch logrotation to use SIGHUP.
> > > > > >
> > > > > > - We will use SIGUSR2 as a flag for checking for the presence of a
> > > > > > "read-only.txt" at the top of the tree.
> > > > > >
> > > > > > - At application start, or when SIGUSR2 fires, if "read-only.txt" is
> > > > > > found at the top of the tree, the application will switch to (or stay
> > > > > > in) read-only mode. If it is not found, the application will switch
> > > > > > to (or stay in) normal read-write mode.
> > > > > >
> > > > > > - We will provide a key-value page to verify the read-only status of
> > > > > > (each) application.
> > > > > >
> > > > > > Here are my thoughts:
> > > > > >
> > > > > > - I think the key-value page would be very valuable for LOSA peace of
> > > > > > mind, so I like the idea. However, it is only pertinent for a given
> > > > > > application instance. Going to this page through the load-balancer
> > > > > > would not be valuable. LOSAs, would you immediately use this page
> > > > > > if we offered it, going to each instance in the cluster?
> > > > >
> > > > > It'd be nice, but I don't want to block on it.
> > > > >
> > > > > > If not, I'd like to push it out of the scope of this effort, until we
> > > > > > can think about offering an aggregated view of information like this
> > > > > > in a dashboard like the one Maris will hopefully be working on this
> > > > > > cycle.
> > > > > >
> > > > > > - I think we should definitely log mode switches. Then LOSAs can at
> > > > > > least trail the logs for a given instance to verify that the app
> > > > > > noticed the signal and the presence or absence of the file.
> > > > >
> > > > > +1
> > > > > >
> > > > > > - If the LOSAs don't want to rock the boat with changing logrotation
> > > > > > to SIGHUP, we do have a swath of signals from SIGRTMIN to SIGRTMAX
> > > > > > that we could use. I'm in favor of the SIGHUP switch if the LOSAs
> > > > > > don't mind, though.
> > > > > >
> > > > > This switch is okay.
> > > > >
> > > >
> > > > Today I started working on this, and following is my initial plan:
> > > >
> > > > Currently, the way we switch to read-only is by changing the
> > > > read_only config to True *and* changing the main_master and
> > > > main_slave configs to point to standalone databases. What we
> > > > want is to get rid of the read_only config and collapse the
> > > > extra config files we have for read-only mode (lpnet1-db-update)
> > > > into the lpnet1 config.
> > > >
> > > > In order to do this we will use the presence of a file
> > > > (read-only.txt) on the root of the tree to identify (upon
> > > > startup or SIGUSR2) whether or not we're in read-only mode, and
> > > > set the main_master and main_slave configs appropriately. As
> > > > we'll be overwriting these config variables, we'll need to store
> > > > all different values we might use for them in new variables
> > > > (e.g. rw_main_master, rw_main_slave, ro_main_master and
> > > > ro_main_slave). (we might even get rid of the main_master and
> > > > main_slave config variables as they will be computed values,
> > > > which can be moved somewhere else. although I'm not sure this
> > > > is a good idea because all other db names live in config
> > > > variables).
> > > >
> > > > The plan:
> > > >
> > > > • Change all places that use config.launchpad.read_only to use
> > > > another helper, which tells whether or not we're in read-only
> > > > mode by looking for a read-only.txt file.
> > > > • switch logrotation to use SIGHUP.
> > > > • Rename main_master and main_slave to rw_main_master and
> > > > rw_main_slave, adding new (and empty) main_master and main_slave
> > > > config variables, which get set upon startup/SIGUSR2 (with the
> > > > values of rw_*).
> > > > • log read-only/read-write switches
> > > >
> > > > However, after I started implementing it I realized that having two
> > > > switches (the read-only.txt file and the SIGUSR2) to turn on read-only
> > > > doesn't sound like a very good idea (as we may accidentally leave an app
> > > > server in an inconsistent state), so we may want to use SIGUSR2 to
> > > > create a read-only.txt file *and* trigger the code that sets the configs
> > > > with the appropriate values.
> > >
> > > You don't need to worry about creating/deleting the read-only.txt file -
> > > we'll manage that through external means (initscripts or other helper
> > > scripts). I'd envisage you only need one signal which means "check again
> > > whether we're in read-only or read-write mode".
> > >
> >
> > As we discussed on IRC, my concern was that having a read-only.txt file
> > did not mean we were in read-only mode -- the SIGUSR2 is needed, and if
> > forgotten the server would be in an inconsistent state. In that state,
> > the python code thinks we're running in read only (because it relies on
> > read-only.txt for that) but we're still connecting to the rw db (because
> > we rely on SIGUSR2 to change to the ro dbs).
> >
> > Anyway, that didn't seem to be a big deal as this is going to be handled
> > by scripts, so I went ahead and tried to implement that. As usual, I've
> > encountered some problems, and they seem to boil down to the way our
> > config works -- the config variables are immutable so to make changes we
> > need to push/pop overlays on top of the existing config.
> >
> > Since config.pop(name) removes the overlay with the given name and any
> > others that were on top of it, we can't rely on config.push/pop to
> > update the config values because we might end up inadvertently reverting
> > others' changes and others might do the same to ours. I think this
> > push/pop mechanism was meant only for testing purposes.
> >
> > After realizing that I came up with another approach, which relies only
> > on the presence/absence of the read-only.txt file to figure out the mode
> > we're on. On this approach, config.database.main_master/slave are gone
> > and we use dbconfig.main_master/slave instead, which are properties in
> > DatabaseConfig that return the appropriate value according to the mode
> > we're on.
>
> Does this mean we're checking for the presence of this text file before
> every database operation? That sounds quite IO intensive.
ISTM that the presence of the file would be checked only a couple times
(once for each of the properties in DatabaseConfig that look for that
file) for each handler thread, as a consequence of storm creating the DB
connections when they're first used.
If that's correct, then we'll have to find a way to reset the stores in
all threads when we switch modes -- something I didn't realize before.
>
> > Although that simplifies things for us and for LOSAs, it also means we
> > can't easily log mode switches (because we don't have the signal
> > anymore).
>
> Surely the server knows a current state, and then if that changes you
> could log it?
Not in the current implementation, as it relies on a @property which
checks the presence of read-only.txt, but it's easy to change that. Not
sure what I had in mind when I wrote the above.
>
> > We could easily workaround that by pushing config changes,
> > but I'd be very uncomfortable doing that, for the reasons I explained
> > above.
> >
> > So, I'd like to know if this would be an acceptable solution, and
> > whether or not we can live without logs of the mode switches?
> >
> > > That make sense?
> > >
> > > > Similarly, when starting up we'd check for
> > > > the presence of read-only.txt and set the config variables with the
> > > > appropriate values. That means we can't use SIGUSR2 to switch back to
> > > > read-write mode, though.
> > > >
> > > > An alternative that would not have any of the problems described above
> > > > would be to keep the existing code using config.launchpad.read_only and
> > > > have the helper function (which looks for read-only.txt) just update
> > > > that config variable upon startup/SIGUSR. That way it'd be much harder
> > > > to have an appserver in read-only mode using the wrong DB, and we'd be
> > > > able to use SIGURS2 to switch back to read-write mode.
> > > >
> > > > Any preferences/suggestions?
> > > >
> > >
> > >
> >
> >
>
>
--
Guilherme Salgado <salgado@xxxxxxxxxxxxx>
Attachment:
signature.asc
Description: This is a digitally signed message part
Follow ups
References