← Back to team overview

nova team mailing list archive

Power states

 

We have the following VM power states defined in compute.power_state (recently
moved from compute.node):

NOSTATE = 0x00
RUNNING = 0x01
BLOCKED = 0x02
PAUSED = 0x03
SHUTDOWN = 0x04
SHUTOFF = 0x05
CRASHED = 0x06

Does anyone know what the intended semantics are of these?  They look like the
libvirt states, which are defined thus:

enum virDomainState {
VIR_DOMAIN_NOSTATE =  0 : no state 
VIR_DOMAIN_RUNNING =  1 : the domain is running 
VIR_DOMAIN_BLOCKED =  2 : the domain is blocked on resource 
VIR_DOMAIN_PAUSED =  3 : the domain is paused by user 
VIR_DOMAIN_SHUTDOWN =  4 : the domain is being shut down 
VIR_DOMAIN_SHUTOFF =  5 : the domain is shut off 
VIR_DOMAIN_CRASHED =  6 : the domain is crashed 
}

This is not how they're being used in the code though.  In particular, when
we get an exception during spawn we set the state to SHUTDOWN (in libvirt
semantics, "being shut down") and when destroying a VM, we wait for it to
transition to SHUTDOWN, rather than SHUTOFF.

We've also got some rather undefined semantics for a state_description over
and above the state itself, with the state_description taking values like
'spawning', 'rebooting', and 'shutting_down'.  The first two are disjoint from
the libvirt states, but the latter is a direct overlap.

I personally don't like the libvirt states.  The name "SHUTDOWN" to mean
"graceful shutdown requested" is very confusing (no doubt the source of some
of the problems above).  Also, that state is guest-dependent (i.e. whether the
VM is shutting down is dependent of the guest co-operating) whereas the other
states are definitively about the virtualization layer.  This mix is awkward.
Particularly weird is that the transition state "being shut down" is present,
but the state "being rebooted" isn't.

Also, there are states, such as "spawning" which we would like to represent,
but can't.

Finally, BLOCKED is a pretty useless state -- a VM might in reality be blocked
on some VCPUs but others, and in any case, this is always a transitory thing,
changing on millisecond scales, and doesn't have any useful meaning in a
remoted API.  (More useful is some statistical measurement of how long each
VCPU has been blocked over an aggregate timescale, and in which circumstances
they've been blocked, but that's a discussion for another time.)


Does anyone have any opinions here?  If not, I'll have a think about how to
rework them into something more suitable.

Thanks,

Ewan.



Follow ups