yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79793
[Bug 1838541] Re: Spurious warnings in compute logs while building/unshelving an instance: Instance cf1dc8a6-48fe-42fd-90a7-d352c58e1454 is not being actively managed by this compute host but has allocations referencing this compute host: {u'resources': {u'VCPU': 1, u'MEMORY_MB': 64}}. Skipping heal of allocation because we do not know what to do.
Reviewed: https://review.opendev.org/673873
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ad4ef5af8bf7dc34caef21b0062ef9cc504bc216
Submitter: Zuul
Branch: master
commit ad4ef5af8bf7dc34caef21b0062ef9cc504bc216
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Wed Jul 31 12:41:00 2019 -0400
rt: soften warning case in _remove_deleted_instances_allocations
During an instance_claim during initial server create or unshelve,
the instance.host/node values will be set before the resource
tracker has the instance in the tracked_instances dict. If
_remove_deleted_instances_allocations is called with the instance
before it's being tracked, a warning like this is logged:
Jul 31 13:12:57.455904 ubuntu-bionic-rax-iad-0009534722
nova-compute[31951]: WARNING nova.compute.resource_tracker
[None req-d6f2ae97-d8f7-46f6-8974-b42aeb58302d None None]
Instance 227c23cd-aeb2-4b5a-b001-21bd920a5e77 is not being actively
managed by this compute host but has allocations referencing this
compute host: {u'resources': {u'MEMORY_MB': 64, u'VCPU': 1,
u'DISK_GB': 1}}. Skipping heal of allocation because we do not know
what to do.
This shows up quite frequently in CI runs (see the bug report for
a logstash query) which means it should not be a warning.
This change checks the instance task_state and if set then we only
log a debug message rather than the warning since we can assume we
are racing and the task will correct itself upon completion.
Change-Id: I6db8bea6761b68c39e6332d4698d1f8312863758
Closes-Bug: #1838541
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1838541
Title:
Spurious warnings in compute logs while building/unshelving an
instance: Instance cf1dc8a6-48fe-42fd-90a7-d352c58e1454 is not being
actively managed by this compute host but has allocations referencing
this compute host: {u'resources': {u'VCPU': 1, u'MEMORY_MB': 64}}.
Skipping heal of allocation because we do not know what to do.
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) queens series:
Confirmed
Status in OpenStack Compute (nova) rocky series:
Confirmed
Status in OpenStack Compute (nova) stein series:
Confirmed
Bug description:
This warning log from the ResourceTracker is logged quite a bit in CI
runs:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22is%20not%20being%20actively%20managed%20by%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d
2601 hits in 7 days.
Looking at one of these the warning shows up while spawning the
instance during an unshelve operation. This is a possible race for the
rt.instance_claim call because the instance.host/node are set here:
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L208
before the instance would be added to the rt.tracked_instances dict
started here:
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L217
If the update_available_resource periodic task runs between those
times, we'll call _remove_deleted_instances_allocations with the
instance and it will have allocations on the node, created by the
scheduler, but may not be in tracked_instances yet so we don't short-
circuit here:
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1339
And hit the log condition here:
https://github.com/openstack/nova/blob/619c0c676aae5359225c54bc27ce349e138e420e/nova/compute/resource_tracker.py#L1397
We should probably downgrade that warning to DEBUG if the instance
task_state is set since clearly the instance is undergoing some state
transition. We should log the task_state and only log the message as a
warning if the instance does not have a task_state set but is also not
tracked on the host.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1838541/+subscriptions
References