← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1917945] [NEW] ConstraintNotMet raised from DB API layer when instance is not found

 

Public bug reported:

While I was working on another race condition bug around a failure to
delete an instance while it was booting [1], I noticed that we have an
assumption in the DB API layer that if we fail to soft delete an
instance record, it means that a query constraint was not met.

This was misleading when I was working on debugging [1] because the
traceback indicated a constraint on the 'host' column was not met:

nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi instance.destroy()
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi return fn(self, *args, **kwargs)
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.6/site-packages/nova/objects/instance.py", line 659, in destroy
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi reason='host changed')
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi nova.exception.ObjectActionError: Object action destroy failed because: host changed

which means that the instance.host changed while attempting to destroy
the instance record.

This was however not possible in this case as the instance had not yet
landed on a compute host (nova-compute sets the instance.host). What had
actually happened was that nova-conductor had deleted the instance
record after finding that nova-api had deleted the build request, as
part of its logic to halt the build of an instance that's being deleted
while it's booting. So when nova-api tried to delete the instance
record, it failed (returned 0 rows soft deleted).

Because of the assumption in the DB API layer that a failure to soft
delete means a constraint was not met, it raised ConstraintNotMet, which
instance.destroy interprets as "host changed", which makes nova-api
expect the instance record to exist. So the handling was for a "host
changed" scenario when in reality it was an "instance not found"
scenario.

We can avoid incorrect exception handling and future confusion while
debugging if we make a change to raise InstanceNotFound instead of
ConstraintNotMet when the instance record is missing during a soft
delete.

[1] https://bugs.launchpad.net/nova/+bug/1914777

** Affects: nova
     Importance: Undecided
     Assignee: melanie witt (melwitt)
         Status: New


** Tags: db

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1917945

Title:
  ConstraintNotMet raised from DB API layer when instance is not found

Status in OpenStack Compute (nova):
  New

Bug description:
  While I was working on another race condition bug around a failure to
  delete an instance while it was booting [1], I noticed that we have an
  assumption in the DB API layer that if we fail to soft delete an
  instance record, it means that a query constraint was not met.

  This was misleading when I was working on debugging [1] because the
  traceback indicated a constraint on the 'host' column was not met:

  nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi instance.destroy()
  nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper
  nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi return fn(self, *args, **kwargs)
  nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.6/site-packages/nova/objects/instance.py", line 659, in destroy
  nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi reason='host changed')
  nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi nova.exception.ObjectActionError: Object action destroy failed because: host changed

  which means that the instance.host changed while attempting to destroy
  the instance record.

  This was however not possible in this case as the instance had not yet
  landed on a compute host (nova-compute sets the instance.host). What
  had actually happened was that nova-conductor had deleted the instance
  record after finding that nova-api had deleted the build request, as
  part of its logic to halt the build of an instance that's being
  deleted while it's booting. So when nova-api tried to delete the
  instance record, it failed (returned 0 rows soft deleted).

  Because of the assumption in the DB API layer that a failure to soft
  delete means a constraint was not met, it raised ConstraintNotMet,
  which instance.destroy interprets as "host changed", which makes nova-
  api expect the instance record to exist. So the handling was for a
  "host changed" scenario when in reality it was an "instance not found"
  scenario.

  We can avoid incorrect exception handling and future confusion while
  debugging if we make a change to raise InstanceNotFound instead of
  ConstraintNotMet when the instance record is missing during a soft
  delete.

  [1] https://bugs.launchpad.net/nova/+bug/1914777

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1917945/+subscriptions


Follow ups