← Back to team overview

openstack team mailing list archive

Re: Nova and asynchronous instance launching

 

> Right - examining the current state isn't a good way to determine
> what happened with one particular request. This is exactly one of
> the reasons some providers create Jobs for all actions. Checking the
> resource "later" to see why something bad happened is fragile since
> other opertaons might have happened since then, erasing any "error
> message" type of state info. And relying on event/error logs is hard
> since correlating one particular action with a flood of events is
> tricky - especially in a multi-user environment where several
> actions could be underway at once. If each action resulted in a Job
> URI being returned then the client can check that Job resource when
> its convinient for them - and this could be quite useful in both
> happy and unhappy situations.
> 
> And to be clear, a Job doesn't necessarily need to be a a full new
> resource, it could (under the covers) map to a grouping of event
> logs entries but the point is that from a client's perspective they
> have an easy mechanism (e.g. issue a GET to a single URI) that
> returns all of the info needed to determine what happened with one
> particular operation.

Agreed on all points.

I wonder could we simply leverage the existing X-Compute-Request-Id
header to provide the context on the over-arching operation that the
client wishes to be informed about?

For example, by providing an administrative API extension allowing queries
on the async "Job" status, identified via the req-<UUID> string returned
from the initial call invoking the operation.

Since the components serving such an operation are generally distributed
(e.g. nova-api, nova-scheduler, nova-compute etc.) and tied together via
async messaging, I don't think simple log scraping would be sufficient.

But if each component was to follow logic such as:

1. when a context is received, check status in the nova DB for that
   request ID - if absent, mark as in-progress

2. when an operation hits an unrecoverable error condition, the exception-
   handling path should mark the request as failed in the nova DB

3. when an operation reaches a definitive endpoint, e.g. the instance
   is successfully launched, then the request status is marked as complete

Step #3 would probably be most problematic, in the sense of identifying
what constitutes the logical endpoint for every operation (e.g. a volume
might created from a snapshot in order to be attached somewhere in a
subsequent operation, or as part of a boot-from-volume operation). 

There would be some extra DB manipulation to consider, adding overhead &
latency.

There would also be wrinkles around the lifecycle of entries in the request
status table, when to reap old entries etc. 

Just a thought in any case ...

Cheers,
Eoghan


References