yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #85328
[Bug 1917945] [NEW] ConstraintNotMet raised from DB API layer when instance is not found
Public bug reported:
While I was working on another race condition bug around a failure to
delete an instance while it was booting [1], I noticed that we have an
assumption in the DB API layer that if we fail to soft delete an
instance record, it means that a query constraint was not met.
This was misleading when I was working on debugging [1] because the
traceback indicated a constraint on the 'host' column was not met:
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi instance.destroy()
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi return fn(self, *args, **kwargs)
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.6/site-packages/nova/objects/instance.py", line 659, in destroy
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi reason='host changed')
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi nova.exception.ObjectActionError: Object action destroy failed because: host changed
which means that the instance.host changed while attempting to destroy
the instance record.
This was however not possible in this case as the instance had not yet
landed on a compute host (nova-compute sets the instance.host). What had
actually happened was that nova-conductor had deleted the instance
record after finding that nova-api had deleted the build request, as
part of its logic to halt the build of an instance that's being deleted
while it's booting. So when nova-api tried to delete the instance
record, it failed (returned 0 rows soft deleted).
Because of the assumption in the DB API layer that a failure to soft
delete means a constraint was not met, it raised ConstraintNotMet, which
instance.destroy interprets as "host changed", which makes nova-api
expect the instance record to exist. So the handling was for a "host
changed" scenario when in reality it was an "instance not found"
scenario.
We can avoid incorrect exception handling and future confusion while
debugging if we make a change to raise InstanceNotFound instead of
ConstraintNotMet when the instance record is missing during a soft
delete.
[1] https://bugs.launchpad.net/nova/+bug/1914777
** Affects: nova
Importance: Undecided
Assignee: melanie witt (melwitt)
Status: New
** Tags: db
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1917945
Title:
ConstraintNotMet raised from DB API layer when instance is not found
Status in OpenStack Compute (nova):
New
Bug description:
While I was working on another race condition bug around a failure to
delete an instance while it was booting [1], I noticed that we have an
assumption in the DB API layer that if we fail to soft delete an
instance record, it means that a query constraint was not met.
This was misleading when I was working on debugging [1] because the
traceback indicated a constraint on the 'host' column was not met:
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi instance.destroy()
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi return fn(self, *args, **kwargs)
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.6/site-packages/nova/objects/instance.py", line 659, in destroy
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi reason='host changed')
nova-api.log.1:2021-02-02 08:51:20.093 19 ERROR nova.api.openstack.wsgi nova.exception.ObjectActionError: Object action destroy failed because: host changed
which means that the instance.host changed while attempting to destroy
the instance record.
This was however not possible in this case as the instance had not yet
landed on a compute host (nova-compute sets the instance.host). What
had actually happened was that nova-conductor had deleted the instance
record after finding that nova-api had deleted the build request, as
part of its logic to halt the build of an instance that's being
deleted while it's booting. So when nova-api tried to delete the
instance record, it failed (returned 0 rows soft deleted).
Because of the assumption in the DB API layer that a failure to soft
delete means a constraint was not met, it raised ConstraintNotMet,
which instance.destroy interprets as "host changed", which makes nova-
api expect the instance record to exist. So the handling was for a
"host changed" scenario when in reality it was an "instance not found"
scenario.
We can avoid incorrect exception handling and future confusion while
debugging if we make a change to raise InstanceNotFound instead of
ConstraintNotMet when the instance record is missing during a soft
delete.
[1] https://bugs.launchpad.net/nova/+bug/1914777
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1917945/+subscriptions
Follow ups