yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #86725
[Bug 1937292] Re: All overcloud VM's powered off on hypervisor when nova_libvirt is restarted
This is an issue related to the runtime for podman. In my case crun caused the issue, runc works fine:
https://paste.opendev.org/show/807670/
For whatever reason I can't post that comment directly here.
** Changed in: tripleo
Status: New => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1937292
Title:
All overcloud VM's powered off on hypervisor when nova_libvirt is
restarted
Status in OpenStack Compute (nova):
Incomplete
Status in tripleo:
Invalid
Bug description:
Description:
Using TripleO. Noted that all VM's on a Hypervisor are powered off
during the overcloud deployment. (I only have one Hypervisor sorry, I
can't tell you if it would happen to more than one hypervisor).
Seems to happen when the nova_libvirt container is restarted.
Environment:
TripleO - Master
# podman exec -it nova_libvirt rpm -qa | grep nova
python3-nova-23.1.0-0.20210625160814.1f6c351.el8.noarch
openstack-nova-compute-23.1.0-0.20210625160814.1f6c351.el8.noarch
openstack-nova-common-23.1.0-0.20210625160814.1f6c351.el8.noarch
openstack-nova-migration-23.1.0-0.20210625160814.1f6c351.el8.noarch
python3-novaclient-17.5.0-0.20210601131008.f431295.el8.noarch
Reproducer:
At least for me:
1. Start a VM
2. Restart tripleo_nova_libvirt.service:
systemctl restart tripleo_nova_libvirt.service
3. All VM's are stopped
Relevant logs:
2021-07-22 16:31:05.532 3 DEBUG nova.compute.manager [req-19a38d0b-e019-472b-95c4-03c796040767 d2ab1d5792604ba094af82d7447e88cf c4740b2aba4147adb7f101a2782003c3 - default default] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] No wait
ing events found dispatching network-vif-plugged-d9b29fef-cd87-41db-ba79-8b8c65b74efb pop_instance_event /usr/lib/python3.6/site-packages/nova/compute/manager.py:319
2021-07-22 16:31:05.532 3 WARNING nova.compute.manager [req-19a38d0b-e019-472b-95c4-03c796040767 d2ab1d5792604ba094af82d7447e88cf c4740b2aba4147adb7f101a2782003c3 - default default] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Recei
ved unexpected event network-vif-plugged-d9b29fef-cd87-41db-ba79-8b8c65b74efb for instance with vm_state active and task_state None.
2021-07-22 16:31:30.583 3 DEBUG nova.compute.manager [req-7be814ae-0e3d-4631-8a4c-348ead46c213 - - - - -] Triggering sync for uuid b28cc3ae-6442-40cf-9d66-9d4938a567c7 _sync_power_states /usr/lib/python3.6/site-packages/nova/compute/man
ager.py:9695
2021-07-22 16:31:30.589 3 DEBUG oslo_concurrency.lockutils [-] Lock "b28cc3ae-6442-40cf-9d66-9d4938a567c7" acquired by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" ::
waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
2021-07-22 16:31:30.746 3 INFO nova.compute.manager [-] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] During _sync_instance_power_state the DB power_state (1) does not match the vm_power_state from the hypervisor (4). Updating power_
state in the DB to match the hypervisor.
2021-07-22 16:31:30.930 3 WARNING nova.compute.manager [-] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state:
1, current VM power_state: 4
2021-07-22 16:31:30.931 3 DEBUG nova.compute.api [-] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Going to try to stop instance force_stop /usr/lib/python3.6/site-packages/nova/compute/api.py:2584
2021-07-22 16:31:31.135 3 DEBUG oslo_concurrency.lockutils [-] Lock "b28cc3ae-6442-40cf-9d66-9d4938a567c7" released by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" ::
held 0.547s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:371
2021-07-22 16:31:31.161 3 DEBUG oslo_concurrency.lockutils [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] Lock "b28cc3ae-6442-40cf-9d66-9d4938a567c7" acquired by "nova.compute.manager.ComputeManager.stop_instance.<locals>.do_stop_
instance" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
2021-07-22 16:31:31.162 3 DEBUG nova.compute.manager [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Checking state _get_power_state /usr/lib/python3.6/site-packages/nova/compute/man
ager.py:1561
2021-07-22 16:31:31.165 3 DEBUG nova.compute.manager [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Stopping instance; current vm_state: active, current task_state: powering-off, cu
rrent DB power_state: 4, current VM power_state: 4 do_stop_instance /usr/lib/python3.6/site-packages/nova/compute/manager.py:3095
2021-07-22 16:31:31.166 3 INFO nova.compute.manager [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Instance is already powered off in the hypervisor when stop is called.
2021-07-22 16:31:31.168 3 DEBUG nova.objects.instance [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] Lazy-loading 'flavor' on Instance uuid b28cc3ae-6442-40cf-9d66-9d4938a567c7 obj_load_attr /usr/lib/python3.6/site-packages/nova/$
bjects/instance.py:1103
2021-07-22 16:31:31.236 3 DEBUG nova.objects.instance [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] Lazy-loading 'metadata' on Instance uuid b28cc3ae-6442-40cf-9d66-9d4938a567c7 obj_load_attr /usr/lib/python3.6/site-packages/nov$
/objects/instance.py:1103
2021-07-22 16:31:31.302 3 INFO nova.virt.libvirt.driver [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Instance already shutdown.
2021-07-22 16:31:31.308 3 INFO nova.virt.libvirt.driver [-] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Instance destroyed successfully.
I'll attach the full debug logs as well.
I couldn't see anything on the Controller node relevant to it. Just
seems to be the compute node, but I can capture whatever we need.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1937292/+subscriptions
References