← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1855752] [NEW] Inappropriate HTTP error status from os-server-external-events

 

Public bug reported:

The handling of os-server-external-events API [1] has a bug. It is designed to handle multiple events, with the following expected behavior:
* If all events are successfully handled, it should return HTTP 200.
* If no event is successfully handled, it should return HTTP 404.
* If some are handled successfully but not all, it should return HTTP 207, with per-event status codes.

However, when Cyborg sends a single event for a single instance, and
that instance is not yet associated with a host [*], the 'else' clause
in Line 137 [1] will set HTTP 207 as return code; but, since
accepted_events is [] in Line 146, that will throw an exception and
return 404. IOW, the expected return is 207 but the actual return is
404.

This has been discussed in IRC [2]. A patch has been proposed [3] to
address this.

[*] This happens because Nova calls into Cyborg from the conductor to
initiate binding of accelerator requests (ARQs), lets it proceed
asynchronously, and waits for the binding notification event in the
compute manager. The notification event could come before the compute
manager has called self._rt.instance_claim(), which would associate the
instance with a host and a node. That race condition triggers the
behavior above.

[1]
https://github.com/openstack/nova/blob/62f6a0a1bc6c4b24621e1c2e927177f99501bef3/nova/api/openstack/compute/server_external_events.py

[2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova
/%23openstack-nova.2019-12-09.log.html#t2019-12-09T15:45:18

[3] https://review.opendev.org/#/c/698037/

** Affects: nova
     Importance: Undecided
     Assignee: Eric Fried (efried)
         Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1855752

Title:
  Inappropriate HTTP error status from os-server-external-events

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  The handling of os-server-external-events API [1] has a bug. It is designed to handle multiple events, with the following expected behavior:
  * If all events are successfully handled, it should return HTTP 200.
  * If no event is successfully handled, it should return HTTP 404.
  * If some are handled successfully but not all, it should return HTTP 207, with per-event status codes.

  However, when Cyborg sends a single event for a single instance, and
  that instance is not yet associated with a host [*], the 'else' clause
  in Line 137 [1] will set HTTP 207 as return code; but, since
  accepted_events is [] in Line 146, that will throw an exception and
  return 404. IOW, the expected return is 207 but the actual return is
  404.

  This has been discussed in IRC [2]. A patch has been proposed [3] to
  address this.

  [*] This happens because Nova calls into Cyborg from the conductor to
  initiate binding of accelerator requests (ARQs), lets it proceed
  asynchronously, and waits for the binding notification event in the
  compute manager. The notification event could come before the compute
  manager has called self._rt.instance_claim(), which would associate
  the instance with a host and a node. That race condition triggers the
  behavior above.

  [1]
  https://github.com/openstack/nova/blob/62f6a0a1bc6c4b24621e1c2e927177f99501bef3/nova/api/openstack/compute/server_external_events.py

  [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova
  /%23openstack-nova.2019-12-09.log.html#t2019-12-09T15:45:18

  [3] https://review.opendev.org/#/c/698037/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1855752/+subscriptions


Follow ups