yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79460
[Bug 1834349] Re: Error updating resources for node ubuntu-bionic-ovh-bhs1-0008373888.: AttributeError: 'NoneType' object has no attribute 'flavorid' - race with resize confirm
Reviewed: https://review.opendev.org/667687
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=818419c9d313bd6151d1a05b3a087a15116f61b8
Submitter: Zuul
Branch: master
commit 818419c9d313bd6151d1a05b3a087a15116f61b8
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Wed Jun 26 13:26:44 2019 -0400
Fix AttributeError in RT._update_usage_from_migration
Change Ieb539c9a0cfbac743c579a1633234537a8e3e3ee in Stein
added some logging in _update_usage_from_migration to log
the flavor for an inbound and outbound migration.
If an instance is resized and then the resize is immediately
confirmed, it's possible to race with ComputeManager._confirm_resize
setting the instance.old_flavor to None before the migration
status is changed to "confirmed" while the update_available_resource
periodic is running which will result in _update_usage_from_migration
hitting an AttributeError when trying to log instance.old_flavor.flavorid
since instance.old_flavor is None.
There are a few key points there:
- We get into _update_usage_from_migration because the
_update_available_resource method gets in-progress migrations
related to the host (in this case the source compute) and the
migration is consider in-progress until its status is "confirmed".
- The instance is not in the tracked_instances dict when
_update_usage_from_migration runs because RT only tracks instances
where the instance.host matches the RT.host and in this case the
instance has been resized to another compute and the instance.host
is pointing at the dest compute.
The fix here is to simply check if we got the instance.old_flavor and
not log the message if we do not have it, which gets us back to the old
behavior.
This bug was found by noticing it in CI job logs - there is a link to
hits in logstash in the bug report.
As for the "incoming and not tracked" case in _update_usage_from_migration
I have not modified that since I am not sure we have the same race nor
have I seen it in CI logs.
Change-Id: I43e34b3ff1424d42632a3e8f842c93508905aa1a
Closes-Bug: #1834349
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1834349
Title:
Error updating resources for node ubuntu-bionic-ovh-bhs1-0008373888.:
AttributeError: 'NoneType' object has no attribute 'flavorid' - race
with resize confirm
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) stein series:
Confirmed
Bug description:
Seeing this in CI jobs:
http://logs.openstack.org/45/666845/1/check/neutron-tempest-dvr-ha-
multinode-
full/5b09053/controller/logs/screen-n-cpu.txt#_Jun_26_13_51_48_973568
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager [None req-903b3e73-3ce3-4f5c-9a30-811383077679 None None] Error updating resources for node ubuntu-bionic-ovh-bhs1-0008373888.: AttributeError: 'NoneType' object has no attribute 'flavorid'
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager Traceback (most recent call last):
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 8101, in _update_available_resource_for_node
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager startup=startup)
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 735, in update_available_resource
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup)
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager File "/usr/local/lib/python3.6/dist-packages/oslo_concurrency/lockutils.py", line 328, in inner
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager return f(*args, **kwargs)
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 783, in _update_available_resource
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager self._update_usage_from_migrations(context, migrations, nodename)
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1192, in _update_usage_from_migrations
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager nodename)
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1118, in _update_usage_from_migration
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager migration.uuid, itype.flavorid, instance=instance)
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager AttributeError: 'NoneType' object has no attribute 'flavorid'
Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager
I'm not sure if this is a race, but it's this code:
https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/compute/resource_tracker.py#L1118
which was introduced with this change (of mine) in Stein:
https://github.com/openstack/nova/commit/20a46ece5701c9798a5e0df12c944237cb1ece3e
According to logstash this is only showing up in multinode jobs:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AttributeError%3A%20'NoneType'%20object%20has%20no%20attribute%20'flavorid'%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d
Given this code to get the flavor:
https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/compute/resource_tracker.py#L1439
def _get_instance_type(self, instance, prefix, migration):
"""Get the instance type from instance."""
stashed_flavors = migration.migration_type in ('resize',)
if stashed_flavors:
return getattr(instance, '%sflavor' % prefix)
else:
# NOTE(ndipanov): Certain migration types (all but resize)
# do not change flavors so there is no need to stash
# them. In that case - just get the instance flavor.
return instance.flavor
And this in the logs to load the instance.old_flavor:
Jun 26 13:51:48.911052 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: INFO nova.compute.resource_tracker [None req-903b3e73-3ce3-4f5c-9a30-811383077679 None None] [instance: 4342d535-d00f-4ebc-930a-8e634f343eef] Updating resource usage from migration 87280d26-3511-47cb-9a69-6d510fd92718
Jun 26 13:51:48.911397 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: DEBUG nova.objects.instance [None req-903b3e73-3ce3-4f5c-9a30-811383077679 None None] Lazy-loading 'old_flavor' on Instance uuid 4342d535-d00f-4ebc-930a-8e634f343eef {{(pid=27912) obj_load_attr /opt/stack/nova/nova/objects/instance.py:1053}}
This is coming from a resize operation. I also see this:
Jun 26 13:51:48.844575 ubuntu-bionic-ovh-bhs1-0008373888 nova-
compute[27912]: DEBUG nova.compute.resource_tracker [None req-
903b3e73-3ce3-4f5c-9a30-811383077679 None None] Migration for instance
4342d535-d00f-4ebc-930a-8e634f343eef refers to another host's
instance! {{(pid=27912) _pair_instances_to_migrations
/opt/stack/nova/nova/compute/resource_tracker.py:751}}
Which was added because of https://review.opendev.org/#/c/382195/ so
we might just be racing with the confirm/revert during the
update_available_resource task and the resize was confirmed/reverted.
Further up in the logs I also see this, confirming the instance is
being resized:
Jun 26 13:51:48.559335 ubuntu-bionic-ovh-bhs1-0008373888 nova-
compute[27912]: INFO nova.virt.libvirt.driver [None req-903b3e73-3ce3
-4f5c-9a30-811383077679 None None] [instance: 4342d535-d00f-4ebc-930a-
8e634f343eef] Periodic task is updating the host stats; it is trying
to get disk info for instance-0000006c, but the backing disk was
removed by a concurrent operation (task_state=None) and
(vm_state=resized)
When confirming a resize (which tempest will do as soon as the server
status goes to VERIFY_RESIZE in most tests), the instance.old_flavor
is nulled out here:
https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/compute/manager.py#L4031
And then the move claim is dropped on the source compute:
https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/compute/manager.py#L4057
and that's where we remove it from tracked_migrations in the RT:
https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/compute/resource_tracker.py#L464
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1834349/+subscriptions
References