openstack team mailing list archive

Thread
Date

Re: Nova and asynchronous instance launching

To: openstack@xxxxxxxxxxxxxxxxxxx
From: David Kranz <david.kranz@xxxxxxxxxx>
Date: Fri, 29 Jun 2012 13:50:59 -0400
In-reply-to: <C49B6B58853ACD4E96677F24DD9C446C73A29BFAC8@GVW1114EXC.americas.hpqcorp.net>
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1

An assumption is being made here that the "user" and "cloud provider"are unrelated. But I think there are many projects under developmentwhere a cloud-based service is being provided on top of an OpenStackinfrastructure. In that use case, the direct user of OpenStack APIs andthe "cloud provider" may be the same entity. It would be really nice ifwhen an application fires up an instance that enters the error state,there was an api that could get the reason why it failed with as muchinformation as the OpenStack code that set the instance state to ERROR had.

If we are concerned that such information is sensitive and a publicprovider might not want to give it all to users, this could be anadmin-only API. There are many

variations of how the information is controlled.

 -David

If we are concerned that a public provider might not want to give someinformation to users, this could be an admin-only API.

On 6/29/2012 11:40 AM, Day, Phil wrote:

>However, considering the unhappy-path for a second, is there a place
for surfacing some more context as to why the new instance unexpectedly
went into the ERROR state?
I assume the philosophy is that the API has validated the request asfar and it can, and returned any meaningful error messages, etc.Anything that fails past that point is something going wrong from thecloud provider and there is nothing the user could have done to avoidthe error, so any additional information won't help them.
However on the basis that up-front validation is seldom perfect, andthings can change while a request is in flight I think that being ableto tell a user that, for example, their request failed because theimage was deleted before it could be downloaded would be useful.
One approach might be to make the task_state more granular and usethat to qualify the error. In general our users have found havingthe state shown as "vm_state (task_state)" was useful as it showsprogress during things like building.
Phil
*From:*openstack-bounces+philip.day=hp.com@xxxxxxxxxxxxxxxxxxx[mailto:openstack-bounces+philip.day=hp.com@xxxxxxxxxxxxxxxxxxx] *OnBehalf Of *Doug Davis
*Sent:* 29 June 2012 12:45
*To:* Eoghan Glynn
*Cc:* openstack@xxxxxxxxxxxxxxxxxxx
*Subject:* Re: [Openstack] Nova and asynchronous instance launching
Right - examining the current state isn't a good way to determine whathappened with one particular request. This is exactly one of thereasons some providers create Jobs for all actions. Checking theresource "later" to see why something bad happened is fragile sinceother opertaons might have happened since then, erasing any "errormessage" type of state info. And relying on event/error logs is hardsince correlating one particular action with a flood of events istricky - especially in a multi-user environment where several actionscould be underway at once. If each action resulted in a Job URI beingreturned then the client can check that Job resource when itsconvinient for them - and this could be quite useful in both happy andunhappy situations.
And to be clear, a Job doesn't necessarily need to be a a full newresource, it could (under the covers) map to a grouping of event logsentries but the point is that from a client's perspective they have aneasy mechanism (e.g. issue a GET to a single URI) that returns all ofthe info needed to determine what happened with one particular operation.
thanks
-Doug
______________________________________________________
STSM |  Standards Architect  |  IBM Software Group
(919) 254-6905  |  IBM 444-6905  | dug@xxxxxxxxxx <mailto:dug@xxxxxxxxxx>
The more I'm around some people, the more I like my dog.

*Eoghan Glynn <eglynn@xxxxxxxxxx <mailto:eglynn@xxxxxxxxxx>>*

06/29/2012 06:00 AM

	

To

	

Doug Davis/Raleigh/IBM@IBMUS

cc

	
openstack@xxxxxxxxxxxxxxxxxxx <mailto:openstack@xxxxxxxxxxxxxxxxxxx>,Jay Pipes <jaypipes@xxxxxxxxx <mailto:jaypipes@xxxxxxxxx>>
Subject

	

Re: [Openstack] Nova and asynchronous instance launching


	





> Note that I do distinguish between a 'real' async op (where you
> really return little more than a 202) and one that returns a
> skeleton of the resource being created - like instance.create() does
> now.

So the latter approach at least provides a way to poll on the resource
status, so as to figure out if and when it becomes usable.

In the happy-path, eventually the instance status transitions to
ACTIVE and away we go.

However, considering the unhappy-path for a second, is there a place
for surfacing some more context as to why the new instance unexpectedly
went into the ERROR state?

For example even just an indication that failure occurred in the scheduler
(e.g. resource starvation) or on the target compute node. Is the thought
that such information may be operationally sensitive, or just TMI for a
typical cloud user?

Cheers,
Eoghan



_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Follow ups

Re: Nova and asynchronous instance launching
From: Jay Pipes, 2012-07-01

References

Re: Nova and asynchronous instance launching
From: Eoghan Glynn, 2012-06-29
Re: Nova and asynchronous instance launching
From: Doug Davis, 2012-06-29
Re: Nova and asynchronous instance launching
From: Day, Phil, 2012-06-29