← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2076228] [NEW] nova-scheduler fails to acquire lock on hosts on live migration

 

Public bug reported:

Description
=============

I am running OpenStack Antelope via charmed with juju deployment on Ceph
backed storage. Antelope was upgraded from Zed which was originally
deployed following the official OpenStack charmed guide upgrade.

All hosts are running the same hardware, they are Dell PowerEdge R610
with 24 cores and 48gb of RAM

Tried --live-migration with volume backed VMs, with image backed VMs
(and --block-migration). All hosts have /var/lib/nova/instances shared
via NFS for local storage.

VMs that should be live migrated do not have extra configuration
properties linking to AZs or similar. Plain VMs created from Horizon
dashboard.

Steps to reproduce
==================

Upgrade from Zed to Antelope, try to live-migrate VMs


Logs & Configs
=================

Environment uses libvirt/KVM with neutron-api and OVN SDN.

Nova version 27.1.0
ii  nova-api-os-compute                  3:27.1.0-0ubuntu1.2~cloud0                           all          OpenStack Compute - OpenStack Compute API frontend
ii  nova-common                          3:27.1.0-0ubuntu1.2~cloud0                           all          OpenStack Compute - common files
ii  nova-conductor                       3:27.1.0-0ubuntu1.2~cloud0                           all          OpenStack Compute - conductor service
ii  nova-scheduler                       3:27.1.0-0ubuntu1.2~cloud0                           all          OpenStack Compute - virtual machine scheduler
ii  nova-spiceproxy                      3:27.1.0-0ubuntu1.2~cloud0                           all          OpenStack Compute - spice html5 proxy
ii  python3-nova                         3:27.1.0-0ubuntu1.2~cloud0                           all          OpenStack Compute Python 3 libraries
ii  python3-novaclient                   2:18.3.0-0ubuntu1~cloud0                             all          client library for OpenStack Compute API - 3.x


Filters enabled: AvailabilityZoneFilter,ComputeFilter,ImagePropertiesFilter,DifferentHostFilter,SameHostFilter

Charm configs are default with no changes, live migration worked in Zed,
but now, after debugging the nova-cloud-controller:

FULL LOG:  https://pastebin.com/NvMazzkC

In short, nova-scheduler iterates through the hosts and immediately this
happens with each available host until the list is exhausted:

2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Lock
"('os-host-10.maas', 'os-host-10.maas')" "released" by
"nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
:: held 0.003s inner /usr/lib/python3/dist-
packages/oslo_concurrency/lockutils.py:423

2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - 
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Acquiring lock "('os-host-11.maas', 'os-host-11.maas')" by "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" inner /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:404

2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Lock
"('os-host-11.maas', 'os-host-11.maas')" acquired by
"nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
:: waited 0.000s inner /usr/lib/python3/dist-
packages/oslo_concurrency/lockutils.py:409

2024-08-07 10:15:36.664 1307737 DEBUG nova.scheduler.host_manager [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
Update host state from compute node ( all properties here pulled from
that compute node)

Update host state with aggregates:
[Aggregate(created_at=2023-11-01T17:48:42Z,deleted=False,deleted_at=None,hosts=['os-
host-4-shelf.maas','os-host-1.maas','os-host-2.maas','os-
host-9.maas','os-host-11.maas','os-host-10.maas','os-host-6.maas','os-
host-8.maas','os-host-7.maas','os-host-5.maas','os-
host-3.maas'],id=1,metadata={availability_zone='nova'},name='nova_az',updated_at=None,uuid=9e0b10a6-8030-4bbf-92a7-724d4cb3a0d0)]
_locked_update /usr/lib/python3/dist-
packages/nova/scheduler/host_manager.py:172

 Update host state with service dict: {'id': 52, 'uuid':
'c6778fc7-5575-4859-b6ad-cdca697cebac', 'host': 'os-host-11.maas',
'binary': 'nova-compute', 'topic': 'compute', 'report_count': 14216,
'disabled': False, 'disabled_reason': None, 'last_seen_up':
datetime.datetime(2024, 8, 7, 10, 15, 36, tzinfo=datetime.timezone.utc),
'forced_down': False, 'version': 66, 'created_at':
datetime.datetime(2024, 8, 5, 18, 44, 9, tzinfo=datetime.timezone.utc),
'updated_at': datetime.datetime(2024, 8, 7, 10, 15, 36,
tzinfo=datetime.timezone.utc), 'deleted_at': None, 'deleted': False}
_locked_update /usr/lib/python3/dist-
packages/nova/scheduler/host_manager.py:175

2024-08-07 10:15:36.666 1307737 DEBUG nova.scheduler.host_manager [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
Update host state with instances:
['16a8944d-2ce0-4e3d-88d2-69c3752f3a63',
'3d9ff4c9-4056-4bab-968e-22d4cb286113',
'9a03c8e5-fd84-4802-a9bb-a9a93975775d',
'fffbea8e-3b01-4ede-8b47-f3d000975fd5'] _locked_update
/usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:178

2024-08-07 10:15:36.666 1307737 DEBUG oslo_concurrency.lockutils [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Lock
"('os-host-11.maas', 'os-host-11.maas')" "released" by
"nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
:: held 0.003s inner /usr/lib/python3/dist-
packages/oslo_concurrency/lockutils.py:423

2024-08-07 10:15:36.667 1307737 INFO nova.scheduler.host_manager [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Host
filter ignoring hosts: os-host-6.maas, os-host-3.maas, os-host-7.maas,
os-host-9.maas, os-host-11.maas, os-host-5.maas, os-host-10.maas, os-
host-8.maas

2024-08-07 10:15:36.667 1307737 DEBUG nova.scheduler.manager [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
Filtered [] _get_sorted_hosts /usr/lib/python3/dist-
packages/nova/scheduler/manager.py:675

2024-08-07 10:15:36.667 1307737 DEBUG nova.scheduler.manager [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] There
are 0 hosts available but 1 instances requested to build. _ensure_

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2076228

Title:
  nova-scheduler fails to acquire lock on hosts on live migration

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  =============

  I am running OpenStack Antelope via charmed with juju deployment on
  Ceph backed storage. Antelope was upgraded from Zed which was
  originally deployed following the official OpenStack charmed guide
  upgrade.

  All hosts are running the same hardware, they are Dell PowerEdge R610
  with 24 cores and 48gb of RAM

  Tried --live-migration with volume backed VMs, with image backed VMs
  (and --block-migration). All hosts have /var/lib/nova/instances shared
  via NFS for local storage.

  VMs that should be live migrated do not have extra configuration
  properties linking to AZs or similar. Plain VMs created from Horizon
  dashboard.

  Steps to reproduce
  ==================

  Upgrade from Zed to Antelope, try to live-migrate VMs


  Logs & Configs
  =================

  Environment uses libvirt/KVM with neutron-api and OVN SDN.

  Nova version 27.1.0
  ii  nova-api-os-compute                  3:27.1.0-0ubuntu1.2~cloud0                           all          OpenStack Compute - OpenStack Compute API frontend
  ii  nova-common                          3:27.1.0-0ubuntu1.2~cloud0                           all          OpenStack Compute - common files
  ii  nova-conductor                       3:27.1.0-0ubuntu1.2~cloud0                           all          OpenStack Compute - conductor service
  ii  nova-scheduler                       3:27.1.0-0ubuntu1.2~cloud0                           all          OpenStack Compute - virtual machine scheduler
  ii  nova-spiceproxy                      3:27.1.0-0ubuntu1.2~cloud0                           all          OpenStack Compute - spice html5 proxy
  ii  python3-nova                         3:27.1.0-0ubuntu1.2~cloud0                           all          OpenStack Compute Python 3 libraries
  ii  python3-novaclient                   2:18.3.0-0ubuntu1~cloud0                             all          client library for OpenStack Compute API - 3.x

  
  Filters enabled: AvailabilityZoneFilter,ComputeFilter,ImagePropertiesFilter,DifferentHostFilter,SameHostFilter

  Charm configs are default with no changes, live migration worked in
  Zed, but now, after debugging the nova-cloud-controller:

  FULL LOG:  https://pastebin.com/NvMazzkC

  In short, nova-scheduler iterates through the hosts and immediately
  this happens with each available host until the list is exhausted:

  2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None
  req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
  91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
  da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
  Lock "('os-host-10.maas', 'os-host-10.maas')" "released" by
  "nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
  :: held 0.003s inner /usr/lib/python3/dist-
  packages/oslo_concurrency/lockutils.py:423

  2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - 
  da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Acquiring lock "('os-host-11.maas', 'os-host-11.maas')" by "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" inner /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:404

  2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None
  req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
  91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
  da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
  Lock "('os-host-11.maas', 'os-host-11.maas')" acquired by
  "nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
  :: waited 0.000s inner /usr/lib/python3/dist-
  packages/oslo_concurrency/lockutils.py:409

  2024-08-07 10:15:36.664 1307737 DEBUG nova.scheduler.host_manager
  [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
  91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
  da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
  Update host state from compute node ( all properties here pulled from
  that compute node)

  Update host state with aggregates:
  [Aggregate(created_at=2023-11-01T17:48:42Z,deleted=False,deleted_at=None,hosts=['os-
  host-4-shelf.maas','os-host-1.maas','os-host-2.maas','os-
  host-9.maas','os-host-11.maas','os-host-10.maas','os-host-6.maas','os-
  host-8.maas','os-host-7.maas','os-host-5.maas','os-
  host-3.maas'],id=1,metadata={availability_zone='nova'},name='nova_az',updated_at=None,uuid=9e0b10a6-8030-4bbf-92a7-724d4cb3a0d0)]
  _locked_update /usr/lib/python3/dist-
  packages/nova/scheduler/host_manager.py:172

   Update host state with service dict: {'id': 52, 'uuid':
  'c6778fc7-5575-4859-b6ad-cdca697cebac', 'host': 'os-host-11.maas',
  'binary': 'nova-compute', 'topic': 'compute', 'report_count': 14216,
  'disabled': False, 'disabled_reason': None, 'last_seen_up':
  datetime.datetime(2024, 8, 7, 10, 15, 36,
  tzinfo=datetime.timezone.utc), 'forced_down': False, 'version': 66,
  'created_at': datetime.datetime(2024, 8, 5, 18, 44, 9,
  tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2024,
  8, 7, 10, 15, 36, tzinfo=datetime.timezone.utc), 'deleted_at': None,
  'deleted': False} _locked_update /usr/lib/python3/dist-
  packages/nova/scheduler/host_manager.py:175

  2024-08-07 10:15:36.666 1307737 DEBUG nova.scheduler.host_manager
  [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
  91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
  da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
  Update host state with instances:
  ['16a8944d-2ce0-4e3d-88d2-69c3752f3a63',
  '3d9ff4c9-4056-4bab-968e-22d4cb286113',
  '9a03c8e5-fd84-4802-a9bb-a9a93975775d',
  'fffbea8e-3b01-4ede-8b47-f3d000975fd5'] _locked_update
  /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:178

  2024-08-07 10:15:36.666 1307737 DEBUG oslo_concurrency.lockutils [None
  req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
  91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
  da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
  Lock "('os-host-11.maas', 'os-host-11.maas')" "released" by
  "nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
  :: held 0.003s inner /usr/lib/python3/dist-
  packages/oslo_concurrency/lockutils.py:423

  2024-08-07 10:15:36.667 1307737 INFO nova.scheduler.host_manager [None
  req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
  91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
  da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
  Host filter ignoring hosts: os-host-6.maas, os-host-3.maas, os-
  host-7.maas, os-host-9.maas, os-host-11.maas, os-host-5.maas, os-
  host-10.maas, os-host-8.maas

  2024-08-07 10:15:36.667 1307737 DEBUG nova.scheduler.manager [None
  req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
  91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
  da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
  Filtered [] _get_sorted_hosts /usr/lib/python3/dist-
  packages/nova/scheduler/manager.py:675

  2024-08-07 10:15:36.667 1307737 DEBUG nova.scheduler.manager [None
  req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
  91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
  da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
  There are 0 hosts available but 1 instances requested to build.
  _ensure_

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2076228/+subscriptions