yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79462
[Bug 1838541] Re: Spurious warnings in compute logs while building/unshelving an instance: Instance cf1dc8a6-48fe-42fd-90a7-d352c58e1454 is not being actively managed by this compute host but has allocations referencing this compute host: {u'resources': {u'VCPU': 1, u'MEMORY_MB': 64}}. Skipping heal of allocation because we do not know what to do.
Technically this goes back to Pike but I'm not sure we care about fixing
it there at this point since Pike is in Extended Maintenance mode
upstream. Someone can backport it to stable/pike if they care to.
** Also affects: nova/stein
Importance: Undecided
Status: New
** Also affects: nova/queens
Importance: Undecided
Status: New
** Also affects: nova/rocky
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1838541
Title:
Spurious warnings in compute logs while building/unshelving an
instance: Instance cf1dc8a6-48fe-42fd-90a7-d352c58e1454 is not being
actively managed by this compute host but has allocations referencing
this compute host: {u'resources': {u'VCPU': 1, u'MEMORY_MB': 64}}.
Skipping heal of allocation because we do not know what to do.
Status in OpenStack Compute (nova):
In Progress
Status in OpenStack Compute (nova) queens series:
Confirmed
Status in OpenStack Compute (nova) rocky series:
Confirmed
Status in OpenStack Compute (nova) stein series:
Confirmed
Bug description:
This warning log from the ResourceTracker is logged quite a bit in CI
runs:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22is%20not%20being%20actively%20managed%20by%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d
2601 hits in 7 days.
Looking at one of these the warning shows up while spawning the
instance during an unshelve operation. This is a possible race for the
rt.instance_claim call because the instance.host/node are set here:
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L208
before the instance would be added to the rt.tracked_instances dict
started here:
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L217
If the update_available_resource periodic task runs between those
times, we'll call _remove_deleted_instances_allocations with the
instance and it will have allocations on the node, created by the
scheduler, but may not be in tracked_instances yet so we don't short-
circuit here:
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1339
And hit the log condition here:
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1397
We should probably downgrade that warning to DEBUG if the instance
task_state is set since clearly the instance is undergoing some state
transition. We should log the task_state and only log the message as a
warning if the instance does not have a task_state set but is also not
tracked on the host.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1838541/+subscriptions
References