>However, considering the unhappy-path for a second, is there a place
for surfacing some more context as to why the new instance unexpectedly
went into the ERROR state?
I assume the philosophy is that the API has validated the request as
far and it can, and returned any meaningful error messages, etc.
Anything that fails past that point is something going wrong from the
cloud provider and there is nothing the user could have done to avoid
the error, so any additional information won't help them.
However on the basis that up-front validation is seldom perfect, and
things can change while a request is in flight I think that being able
to tell a user that, for example, their request failed because the
image was deleted before it could be downloaded would be useful.
One approach might be to make the task_state more granular and use
that to qualify the error. In general our users have found having
the state shown as "vm_state (task_state)" was useful as it shows
progress during things like building.
Phil
*From:*openstack-bounces+philip.day=hp.com@xxxxxxxxxxxxxxxxxxx
[mailto:openstack-bounces+philip.day=hp.com@xxxxxxxxxxxxxxxxxxx] *On
Behalf Of *Doug Davis
*Sent:* 29 June 2012 12:45
*To:* Eoghan Glynn
*Cc:* openstack@xxxxxxxxxxxxxxxxxxx
*Subject:* Re: [Openstack] Nova and asynchronous instance launching
Right - examining the current state isn't a good way to determine what
happened with one particular request. This is exactly one of the
reasons some providers create Jobs for all actions. Checking the
resource "later" to see why something bad happened is fragile since
other opertaons might have happened since then, erasing any "error
message" type of state info. And relying on event/error logs is hard
since correlating one particular action with a flood of events is
tricky - especially in a multi-user environment where several actions
could be underway at once. If each action resulted in a Job URI being
returned then the client can check that Job resource when its
convinient for them - and this could be quite useful in both happy and
unhappy situations.
And to be clear, a Job doesn't necessarily need to be a a full new
resource, it could (under the covers) map to a grouping of event logs
entries but the point is that from a client's perspective they have an
easy mechanism (e.g. issue a GET to a single URI) that returns all of
the info needed to determine what happened with one particular operation.
thanks
-Doug
______________________________________________________
STSM | Standards Architect | IBM Software Group
(919) 254-6905 | IBM 444-6905 | dug@xxxxxxxxxx <mailto:dug@xxxxxxxxxx>
The more I'm around some people, the more I like my dog.
*Eoghan Glynn <eglynn@xxxxxxxxxx <mailto:eglynn@xxxxxxxxxx>>*
06/29/2012 06:00 AM
To
Doug Davis/Raleigh/IBM@IBMUS
cc
openstack@xxxxxxxxxxxxxxxxxxx <mailto:openstack@xxxxxxxxxxxxxxxxxxx>,
Jay Pipes <jaypipes@xxxxxxxxx <mailto:jaypipes@xxxxxxxxx>>
Subject
Re: [Openstack] Nova and asynchronous instance launching
> Note that I do distinguish between a 'real' async op (where you
> really return little more than a 202) and one that returns a
> skeleton of the resource being created - like instance.create() does
> now.
So the latter approach at least provides a way to poll on the resource
status, so as to figure out if and when it becomes usable.
In the happy-path, eventually the instance status transitions to
ACTIVE and away we go.
However, considering the unhappy-path for a second, is there a place
for surfacing some more context as to why the new instance unexpectedly
went into the ERROR state?
For example even just an indication that failure occurred in the scheduler
(e.g. resource starvation) or on the target compute node. Is the thought
that such information may be operationally sensitive, or just TMI for a
typical cloud user?
Cheers,
Eoghan
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp