← Back to team overview

maas-devel team mailing list archive

Re: RFC: "Serialising" power actions

 

GMB,

The moon is waning and the wind is coming from the west...

With that aside, I think your logic is coherent.  However, I beg to ask the
question why do we allow the user to do power actions in quick succession
at all?  That is, I think the real issue here is the fact that we allow
users to do power actions quickly which gets us in this whole mess to begin
with.

Some fodder:
Just from a UI standpoint - Isn't there a way we can limit this so that the
UI is disabled and a power action 'button' wouldn't become enabled until
the blocking power action was performed?

Just some thoughts to get the conversation started :)

Newell

On Mon, Sep 15, 2014 at 2:34 PM, Graham Binns <graham.binns@xxxxxxxxxxxxx>
wrote:

> Hi all,
>
> I'm handling the work to "serialise" power actions — at least, I'm getting
> started on it right now. I've  spent some time looking at the problem and I
> wanted to bounce ideas off you all — preferably whilst I sleep :)
>
> So, the problem:
>
>  When a power action is issued to a node (power on, power off, etc.), more
> than one can be in play for a node at once. We don't keep track of them
> once they've been fired, except for receiving a notification when they've
> been successful or failed.
>
> This means that it's possible to issue two conflicting commands
> (e.g. power on followed by power off) in quick succession, which can then
> leave the node in an odd state:  it's theoretically possible that the node
> would stay powered on when MAAS expects it to be off, say if for some
> reason the power off command got executed first — this is even more likely
> with AMT BMCs, since there's a degree of did-I-cast-the-runes-right to get
> a command to work on those, at least when the moon is waning and the wind
> is from the east.
>
> There are, so far as I can tell, two strategies for handling this problem
> properly. Both of them require keeping track of the current power action
> for a node, and both assume that only one action can run at once:
>
> 1: The current power action blocks all others until it as completed. Other
> power actions will be queued and executed in turn.
> - or -
> 2: Each power action supersedes any action that is currently executing —
> the existing action is cancelled and then the new action is run.
> - or -
> 3. We track the current ("now") and "next" actions for the node, but drop
> every action that comes in once those two slots are full.
>
> At first glance the second option is simpler — just cancel whatever's
> there and then do our thing. But I think that it's actually a bit
> deceptive. Consider:
>
>  - How do we "cancel" an action?
>  - How do we ensure that we're not going to end up in an inconsistent
> state if the node is already responding to action #1 when we cancel it?
>
> The first option isn't without its problems either — having a queue of
> actions seems kind of awkward, and could lead to flip-flopping of a node's
> power state. But *not* having a queue could still lead to situations where
> several actions get  issued in quick succession.
>
> The third option seems to offer a happy medium. We can track the current
> and next power actions for a node and then ignore anything else that comes
> in whilst both of those two slots are full. Each action must succeed or
> fail before the next one can be executed. This means we won't get
> potentially ridiculous amounts of flip-flopping, and we build this pretty
> easily. We'd have to have some kind of UI feedback for "hey, it looks like
> you're repeatedly powering this node on and off; I'm going to ignore you
> for a while," but that doesn't seem all that onerous.
>
> So as it stands I'm leaning towards option #3. Questions, thoughts
> and comments are welcome.
>
> ~gmb
> --
> Mailing list: https://launchpad.net/~maas-devel
> Post to     : maas-devel@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maas-devel
> More help   : https://help.launchpad.net/ListHelp
>
>

Follow ups

References