← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1834349] Re: Error updating resources for node ubuntu-bionic-ovh-bhs1-0008373888.: AttributeError: 'NoneType' object has no attribute 'flavorid' - race with resize confirm

 

Reviewed:  https://review.opendev.org/667687
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=818419c9d313bd6151d1a05b3a087a15116f61b8
Submitter: Zuul
Branch:    master

commit 818419c9d313bd6151d1a05b3a087a15116f61b8
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Wed Jun 26 13:26:44 2019 -0400

    Fix AttributeError in RT._update_usage_from_migration
    
    Change Ieb539c9a0cfbac743c579a1633234537a8e3e3ee in Stein
    added some logging in _update_usage_from_migration to log
    the flavor for an inbound and outbound migration.
    
    If an instance is resized and then the resize is immediately
    confirmed, it's possible to race with ComputeManager._confirm_resize
    setting the instance.old_flavor to None before the migration
    status is changed to "confirmed" while the update_available_resource
    periodic is running which will result in _update_usage_from_migration
    hitting an AttributeError when trying to log instance.old_flavor.flavorid
    since instance.old_flavor is None.
    
    There are a few key points there:
    
    - We get into _update_usage_from_migration because the
      _update_available_resource method gets in-progress migrations
      related to the host (in this case the source compute) and the
      migration is consider in-progress until its status is "confirmed".
    
    - The instance is not in the tracked_instances dict when
      _update_usage_from_migration runs because RT only tracks instances
      where the instance.host matches the RT.host and in this case the
      instance has been resized to another compute and the instance.host
      is pointing at the dest compute.
    
    The fix here is to simply check if we got the instance.old_flavor and
    not log the message if we do not have it, which gets us back to the old
    behavior.
    
    This bug was found by noticing it in CI job logs - there is a link to
    hits in logstash in the bug report.
    
    As for the "incoming and not tracked" case in _update_usage_from_migration
    I have not modified that since I am not sure we have the same race nor
    have I seen it in CI logs.
    
    Change-Id: I43e34b3ff1424d42632a3e8f842c93508905aa1a
    Closes-Bug: #1834349


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1834349

Title:
  Error updating resources for node ubuntu-bionic-ovh-bhs1-0008373888.:
  AttributeError: 'NoneType' object has no attribute 'flavorid' - race
  with resize confirm

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  Seeing this in CI jobs:

  http://logs.openstack.org/45/666845/1/check/neutron-tempest-dvr-ha-
  multinode-
  full/5b09053/controller/logs/screen-n-cpu.txt#_Jun_26_13_51_48_973568

  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager [None req-903b3e73-3ce3-4f5c-9a30-811383077679 None None] Error updating resources for node ubuntu-bionic-ovh-bhs1-0008373888.: AttributeError: 'NoneType' object has no attribute 'flavorid'
  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager Traceback (most recent call last):
  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/manager.py", line 8101, in _update_available_resource_for_node
  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager     startup=startup)
  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 735, in update_available_resource
  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager     self._update_available_resource(context, resources, startup=startup)
  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager   File "/usr/local/lib/python3.6/dist-packages/oslo_concurrency/lockutils.py", line 328, in inner
  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager     return f(*args, **kwargs)
  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 783, in _update_available_resource
  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager     self._update_usage_from_migrations(context, migrations, nodename)
  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1192, in _update_usage_from_migrations
  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager     nodename)
  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1118, in _update_usage_from_migration
  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager     migration.uuid, itype.flavorid, instance=instance)
  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager AttributeError: 'NoneType' object has no attribute 'flavorid'
  Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager 

  I'm not sure if this is a race, but it's this code:

  https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/compute/resource_tracker.py#L1118

  which was introduced with this change (of mine) in Stein:

  https://github.com/openstack/nova/commit/20a46ece5701c9798a5e0df12c944237cb1ece3e

  According to logstash this is only showing up in multinode jobs:

  http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AttributeError%3A%20'NoneType'%20object%20has%20no%20attribute%20'flavorid'%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d

  Given this code to get the flavor:

  https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/compute/resource_tracker.py#L1439

      def _get_instance_type(self, instance, prefix, migration):
          """Get the instance type from instance."""
          stashed_flavors = migration.migration_type in ('resize',)
          if stashed_flavors:
              return getattr(instance, '%sflavor' % prefix)
          else:
              # NOTE(ndipanov): Certain migration types (all but resize)
              # do not change flavors so there is no need to stash
              # them. In that case - just get the instance flavor.
              return instance.flavor

  And this in the logs to load the instance.old_flavor:

  Jun 26 13:51:48.911052 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: INFO nova.compute.resource_tracker [None req-903b3e73-3ce3-4f5c-9a30-811383077679 None None] [instance: 4342d535-d00f-4ebc-930a-8e634f343eef] Updating resource usage from migration 87280d26-3511-47cb-9a69-6d510fd92718
  Jun 26 13:51:48.911397 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: DEBUG nova.objects.instance [None req-903b3e73-3ce3-4f5c-9a30-811383077679 None None] Lazy-loading 'old_flavor' on Instance uuid 4342d535-d00f-4ebc-930a-8e634f343eef {{(pid=27912) obj_load_attr /opt/stack/nova/nova/objects/instance.py:1053}}

  This is coming from a resize operation. I also see this:

  Jun 26 13:51:48.844575 ubuntu-bionic-ovh-bhs1-0008373888 nova-
  compute[27912]: DEBUG nova.compute.resource_tracker [None req-
  903b3e73-3ce3-4f5c-9a30-811383077679 None None] Migration for instance
  4342d535-d00f-4ebc-930a-8e634f343eef refers to another host's
  instance! {{(pid=27912) _pair_instances_to_migrations
  /opt/stack/nova/nova/compute/resource_tracker.py:751}}

  Which was added because of https://review.opendev.org/#/c/382195/ so
  we might just be racing with the confirm/revert during the
  update_available_resource task and the resize was confirmed/reverted.

  Further up in the logs I also see this, confirming the instance is
  being resized:

  Jun 26 13:51:48.559335 ubuntu-bionic-ovh-bhs1-0008373888 nova-
  compute[27912]: INFO nova.virt.libvirt.driver [None req-903b3e73-3ce3
  -4f5c-9a30-811383077679 None None] [instance: 4342d535-d00f-4ebc-930a-
  8e634f343eef] Periodic task is updating the host stats; it is trying
  to get disk info for instance-0000006c, but the backing disk was
  removed by a concurrent operation (task_state=None) and
  (vm_state=resized)

  When confirming a resize (which tempest will do as soon as the server
  status goes to VERIFY_RESIZE in most tests), the instance.old_flavor
  is nulled out here:

  https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/compute/manager.py#L4031

  And then the move claim is dropped on the source compute:

  https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/compute/manager.py#L4057

  and that's where we remove it from tracked_migrations in the RT:

  https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/compute/resource_tracker.py#L464

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1834349/+subscriptions


References