← Back to team overview

openstack team mailing list archive

Re: Nova and asynchronous instance launching

 

Right - examining the current state isn't a good way to determine what 
happened with one particular request.  This is exactly one of the reasons 
some providers create Jobs for all actions.  Checking the resource "later" 
to see why something bad happened is fragile since other opertaons might 
have happened since then, erasing any "error message" type of state info. 
And relying on event/error logs is hard since correlating one particular 
action with a flood of events is tricky - especially in a multi-user 
environment where several actions could be underway at once.  If each 
action resulted in a Job URI being returned then the client can check that 
Job resource when its convinient for them - and this could be quite useful 
in both happy and unhappy situations. 

And to be clear, a Job doesn't necessarily need to be a a full new 
resource, it could (under the covers) map to a grouping of event logs 
entries but the point is that from a client's perspective they have an 
easy mechanism (e.g. issue a GET to a single URI) that returns all of the 
info needed to determine what happened with one particular operation.

thanks
-Doug
______________________________________________________
STSM |  Standards Architect  |  IBM Software Group
(919) 254-6905  |  IBM 444-6905  |  dug@xxxxxxxxxx
The more I'm around some people, the more I like my dog.



Eoghan Glynn <eglynn@xxxxxxxxxx> 
06/29/2012 06:00 AM

To
Doug Davis/Raleigh/IBM@IBMUS
cc
openstack@xxxxxxxxxxxxxxxxxxx, Jay Pipes <jaypipes@xxxxxxxxx>
Subject
Re: [Openstack] Nova and asynchronous instance launching







> Note that I do distinguish between a 'real' async op (where you
> really return little more than a 202) and one that returns a
> skeleton of the resource being created - like instance.create() does
> now.

So the latter approach at least provides a way to poll on the resource
status, so as to figure out if and when it becomes usable. 

In the happy-path, eventually the instance status transitions to
ACTIVE and away we go.

However, considering the unhappy-path for a second, is there a place
for surfacing some more context as to why the new instance unexpectedly
went into the ERROR state? 

For example even just an indication that failure occurred in the scheduler
(e.g. resource starvation) or on the target compute node. Is the thought
that such information may be operationally sensitive, or just TMI for a
typical cloud user?

Cheers,
Eoghan



Follow ups

References