yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #80931
[Bug 1855752] Re: Inappropriate HTTP error status from os-server-external-events
Reviewed: https://review.opendev.org/698037
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e6f742544432d6066f1fba4666580919eb7859bd
Submitter: Zuul
Branch: master
commit e6f742544432d6066f1fba4666580919eb7859bd
Author: Eric Fried <openstack@xxxxxxxx>
Date: Mon Dec 9 09:58:53 2019 -0600
Nix os-server-external-events 404 condition
The POST /os-server-external-events API had the following confusing
behavior:
With multiple events in the payload, if *some* (but not all) were
dropped, the HTTP response was 207, with per-event 4xx error codes in
the payload. But if *all* of the events were dropped, the overall HTTP
response was 404 with no payload. Thus, especially for consumers sending
only one event at a time, it was impossible to distinguish e.g. "you
tried to send an event for a nonexistent instance" from "the instance
you specified hasn't landed on a host yet".
This fix gets rid of that sweeping 404 condition, so if *any* subset of
the events are dropped (including *all* of them), the HTTP response will
always be 207, and the payload will always contain granular per-event
error codes.
This effectively means the API can no longer return 404, ever.
Closes-Bug: #1855752
Change-Id: Ibad1b51e2cf50d00102295039b6e82bc00bec058
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1855752
Title:
Inappropriate HTTP error status from os-server-external-events
Status in OpenStack Compute (nova):
Fix Released
Bug description:
The handling of os-server-external-events API [1] has a bug. It is designed to handle multiple events, with the following expected behavior:
* If all events are successfully handled, it should return HTTP 200.
* If no event is successfully handled, it should return HTTP 404.
* If some are handled successfully but not all, it should return HTTP 207, with per-event status codes.
However, when Cyborg sends a single event for a single instance, and
that instance is not yet associated with a host [*], the 'else' clause
in Line 137 [1] will set HTTP 207 as return code; but, since
accepted_events is [] in Line 146, that will throw an exception and
return 404. IOW, the expected return is 207 but the actual return is
404.
This has been discussed in IRC [2]. A patch has been proposed [3] to
address this.
[*] This happens because Nova calls into Cyborg from the conductor to
initiate binding of accelerator requests (ARQs), lets it proceed
asynchronously, and waits for the binding notification event in the
compute manager. The notification event could come before the compute
manager has called self._rt.instance_claim(), which would associate
the instance with a host and a node. That race condition triggers the
behavior above.
[1]
https://github.com/openstack/nova/blob/62f6a0a1bc6c4b24621e1c2e927177f99501bef3/nova/api/openstack/compute/server_external_events.py
[2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova
/%23openstack-nova.2019-12-09.log.html#t2019-12-09T15:45:18
[3] https://review.opendev.org/#/c/698037/
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1855752/+subscriptions
References