yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #94392
[Bug 2076228] [NEW] nova-scheduler fails to acquire lock on hosts on live migration
Public bug reported:
Description
=============
I am running OpenStack Antelope via charmed with juju deployment on Ceph
backed storage. Antelope was upgraded from Zed which was originally
deployed following the official OpenStack charmed guide upgrade.
All hosts are running the same hardware, they are Dell PowerEdge R610
with 24 cores and 48gb of RAM
Tried --live-migration with volume backed VMs, with image backed VMs
(and --block-migration). All hosts have /var/lib/nova/instances shared
via NFS for local storage.
VMs that should be live migrated do not have extra configuration
properties linking to AZs or similar. Plain VMs created from Horizon
dashboard.
Steps to reproduce
==================
Upgrade from Zed to Antelope, try to live-migrate VMs
Logs & Configs
=================
Environment uses libvirt/KVM with neutron-api and OVN SDN.
Nova version 27.1.0
ii nova-api-os-compute 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - OpenStack Compute API frontend
ii nova-common 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - common files
ii nova-conductor 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - conductor service
ii nova-scheduler 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - virtual machine scheduler
ii nova-spiceproxy 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - spice html5 proxy
ii python3-nova 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute Python 3 libraries
ii python3-novaclient 2:18.3.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - 3.x
Filters enabled: AvailabilityZoneFilter,ComputeFilter,ImagePropertiesFilter,DifferentHostFilter,SameHostFilter
Charm configs are default with no changes, live migration worked in Zed,
but now, after debugging the nova-cloud-controller:
FULL LOG: https://pastebin.com/NvMazzkC
In short, nova-scheduler iterates through the hosts and immediately this
happens with each available host until the list is exhausted:
2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Lock
"('os-host-10.maas', 'os-host-10.maas')" "released" by
"nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
:: held 0.003s inner /usr/lib/python3/dist-
packages/oslo_concurrency/lockutils.py:423
2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Acquiring lock "('os-host-11.maas', 'os-host-11.maas')" by "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" inner /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:404
2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Lock
"('os-host-11.maas', 'os-host-11.maas')" acquired by
"nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
:: waited 0.000s inner /usr/lib/python3/dist-
packages/oslo_concurrency/lockutils.py:409
2024-08-07 10:15:36.664 1307737 DEBUG nova.scheduler.host_manager [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
Update host state from compute node ( all properties here pulled from
that compute node)
Update host state with aggregates:
[Aggregate(created_at=2023-11-01T17:48:42Z,deleted=False,deleted_at=None,hosts=['os-
host-4-shelf.maas','os-host-1.maas','os-host-2.maas','os-
host-9.maas','os-host-11.maas','os-host-10.maas','os-host-6.maas','os-
host-8.maas','os-host-7.maas','os-host-5.maas','os-
host-3.maas'],id=1,metadata={availability_zone='nova'},name='nova_az',updated_at=None,uuid=9e0b10a6-8030-4bbf-92a7-724d4cb3a0d0)]
_locked_update /usr/lib/python3/dist-
packages/nova/scheduler/host_manager.py:172
Update host state with service dict: {'id': 52, 'uuid':
'c6778fc7-5575-4859-b6ad-cdca697cebac', 'host': 'os-host-11.maas',
'binary': 'nova-compute', 'topic': 'compute', 'report_count': 14216,
'disabled': False, 'disabled_reason': None, 'last_seen_up':
datetime.datetime(2024, 8, 7, 10, 15, 36, tzinfo=datetime.timezone.utc),
'forced_down': False, 'version': 66, 'created_at':
datetime.datetime(2024, 8, 5, 18, 44, 9, tzinfo=datetime.timezone.utc),
'updated_at': datetime.datetime(2024, 8, 7, 10, 15, 36,
tzinfo=datetime.timezone.utc), 'deleted_at': None, 'deleted': False}
_locked_update /usr/lib/python3/dist-
packages/nova/scheduler/host_manager.py:175
2024-08-07 10:15:36.666 1307737 DEBUG nova.scheduler.host_manager [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
Update host state with instances:
['16a8944d-2ce0-4e3d-88d2-69c3752f3a63',
'3d9ff4c9-4056-4bab-968e-22d4cb286113',
'9a03c8e5-fd84-4802-a9bb-a9a93975775d',
'fffbea8e-3b01-4ede-8b47-f3d000975fd5'] _locked_update
/usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:178
2024-08-07 10:15:36.666 1307737 DEBUG oslo_concurrency.lockutils [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Lock
"('os-host-11.maas', 'os-host-11.maas')" "released" by
"nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
:: held 0.003s inner /usr/lib/python3/dist-
packages/oslo_concurrency/lockutils.py:423
2024-08-07 10:15:36.667 1307737 INFO nova.scheduler.host_manager [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Host
filter ignoring hosts: os-host-6.maas, os-host-3.maas, os-host-7.maas,
os-host-9.maas, os-host-11.maas, os-host-5.maas, os-host-10.maas, os-
host-8.maas
2024-08-07 10:15:36.667 1307737 DEBUG nova.scheduler.manager [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
Filtered [] _get_sorted_hosts /usr/lib/python3/dist-
packages/nova/scheduler/manager.py:675
2024-08-07 10:15:36.667 1307737 DEBUG nova.scheduler.manager [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] There
are 0 hosts available but 1 instances requested to build. _ensure_
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2076228
Title:
nova-scheduler fails to acquire lock on hosts on live migration
Status in OpenStack Compute (nova):
New
Bug description:
Description
=============
I am running OpenStack Antelope via charmed with juju deployment on
Ceph backed storage. Antelope was upgraded from Zed which was
originally deployed following the official OpenStack charmed guide
upgrade.
All hosts are running the same hardware, they are Dell PowerEdge R610
with 24 cores and 48gb of RAM
Tried --live-migration with volume backed VMs, with image backed VMs
(and --block-migration). All hosts have /var/lib/nova/instances shared
via NFS for local storage.
VMs that should be live migrated do not have extra configuration
properties linking to AZs or similar. Plain VMs created from Horizon
dashboard.
Steps to reproduce
==================
Upgrade from Zed to Antelope, try to live-migrate VMs
Logs & Configs
=================
Environment uses libvirt/KVM with neutron-api and OVN SDN.
Nova version 27.1.0
ii nova-api-os-compute 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - OpenStack Compute API frontend
ii nova-common 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - common files
ii nova-conductor 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - conductor service
ii nova-scheduler 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - virtual machine scheduler
ii nova-spiceproxy 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - spice html5 proxy
ii python3-nova 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute Python 3 libraries
ii python3-novaclient 2:18.3.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - 3.x
Filters enabled: AvailabilityZoneFilter,ComputeFilter,ImagePropertiesFilter,DifferentHostFilter,SameHostFilter
Charm configs are default with no changes, live migration worked in
Zed, but now, after debugging the nova-cloud-controller:
FULL LOG: https://pastebin.com/NvMazzkC
In short, nova-scheduler iterates through the hosts and immediately
this happens with each available host until the list is exhausted:
2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
Lock "('os-host-10.maas', 'os-host-10.maas')" "released" by
"nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
:: held 0.003s inner /usr/lib/python3/dist-
packages/oslo_concurrency/lockutils.py:423
2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Acquiring lock "('os-host-11.maas', 'os-host-11.maas')" by "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" inner /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:404
2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
Lock "('os-host-11.maas', 'os-host-11.maas')" acquired by
"nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
:: waited 0.000s inner /usr/lib/python3/dist-
packages/oslo_concurrency/lockutils.py:409
2024-08-07 10:15:36.664 1307737 DEBUG nova.scheduler.host_manager
[None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
Update host state from compute node ( all properties here pulled from
that compute node)
Update host state with aggregates:
[Aggregate(created_at=2023-11-01T17:48:42Z,deleted=False,deleted_at=None,hosts=['os-
host-4-shelf.maas','os-host-1.maas','os-host-2.maas','os-
host-9.maas','os-host-11.maas','os-host-10.maas','os-host-6.maas','os-
host-8.maas','os-host-7.maas','os-host-5.maas','os-
host-3.maas'],id=1,metadata={availability_zone='nova'},name='nova_az',updated_at=None,uuid=9e0b10a6-8030-4bbf-92a7-724d4cb3a0d0)]
_locked_update /usr/lib/python3/dist-
packages/nova/scheduler/host_manager.py:172
Update host state with service dict: {'id': 52, 'uuid':
'c6778fc7-5575-4859-b6ad-cdca697cebac', 'host': 'os-host-11.maas',
'binary': 'nova-compute', 'topic': 'compute', 'report_count': 14216,
'disabled': False, 'disabled_reason': None, 'last_seen_up':
datetime.datetime(2024, 8, 7, 10, 15, 36,
tzinfo=datetime.timezone.utc), 'forced_down': False, 'version': 66,
'created_at': datetime.datetime(2024, 8, 5, 18, 44, 9,
tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2024,
8, 7, 10, 15, 36, tzinfo=datetime.timezone.utc), 'deleted_at': None,
'deleted': False} _locked_update /usr/lib/python3/dist-
packages/nova/scheduler/host_manager.py:175
2024-08-07 10:15:36.666 1307737 DEBUG nova.scheduler.host_manager
[None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
Update host state with instances:
['16a8944d-2ce0-4e3d-88d2-69c3752f3a63',
'3d9ff4c9-4056-4bab-968e-22d4cb286113',
'9a03c8e5-fd84-4802-a9bb-a9a93975775d',
'fffbea8e-3b01-4ede-8b47-f3d000975fd5'] _locked_update
/usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:178
2024-08-07 10:15:36.666 1307737 DEBUG oslo_concurrency.lockutils [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
Lock "('os-host-11.maas', 'os-host-11.maas')" "released" by
"nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
:: held 0.003s inner /usr/lib/python3/dist-
packages/oslo_concurrency/lockutils.py:423
2024-08-07 10:15:36.667 1307737 INFO nova.scheduler.host_manager [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
Host filter ignoring hosts: os-host-6.maas, os-host-3.maas, os-
host-7.maas, os-host-9.maas, os-host-11.maas, os-host-5.maas, os-
host-10.maas, os-host-8.maas
2024-08-07 10:15:36.667 1307737 DEBUG nova.scheduler.manager [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
Filtered [] _get_sorted_hosts /usr/lib/python3/dist-
packages/nova/scheduler/manager.py:675
2024-08-07 10:15:36.667 1307737 DEBUG nova.scheduler.manager [None
req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb
91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - -
da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7]
There are 0 hosts available but 1 instances requested to build.
_ensure_
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2076228/+subscriptions