← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1937292] [NEW] All overcloud VM's powered off on hypervisor when nova_libvirt is restarted

 

Public bug reported:

Description:

Using TripleO. Noted that all VM's on a Hypervisor are powered off
during the overcloud deployment. (I only have one Hypervisor sorry, I
can't tell you if it would happen to more than one hypervisor).

Seems to happen when the nova_libvirt container is restarted.

Environment:
TripleO - Master
# podman exec -it nova_libvirt rpm -qa | grep nova
python3-nova-23.1.0-0.20210625160814.1f6c351.el8.noarch
openstack-nova-compute-23.1.0-0.20210625160814.1f6c351.el8.noarch
openstack-nova-common-23.1.0-0.20210625160814.1f6c351.el8.noarch
openstack-nova-migration-23.1.0-0.20210625160814.1f6c351.el8.noarch
python3-novaclient-17.5.0-0.20210601131008.f431295.el8.noarch

Reproducer:
At least for me:
1. Start a VM
2. Restart tripleo_nova_libvirt.service:
systemctl restart tripleo_nova_libvirt.service
3. All VM's are stopped

Relevant logs:
2021-07-22 16:31:05.532 3 DEBUG nova.compute.manager [req-19a38d0b-e019-472b-95c4-03c796040767 d2ab1d5792604ba094af82d7447e88cf c4740b2aba4147adb7f101a2782003c3 - default default] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] No wait
ing events found dispatching network-vif-plugged-d9b29fef-cd87-41db-ba79-8b8c65b74efb pop_instance_event /usr/lib/python3.6/site-packages/nova/compute/manager.py:319
2021-07-22 16:31:05.532 3 WARNING nova.compute.manager [req-19a38d0b-e019-472b-95c4-03c796040767 d2ab1d5792604ba094af82d7447e88cf c4740b2aba4147adb7f101a2782003c3 - default default] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Recei
ved unexpected event network-vif-plugged-d9b29fef-cd87-41db-ba79-8b8c65b74efb for instance with vm_state active and task_state None.
2021-07-22 16:31:30.583 3 DEBUG nova.compute.manager [req-7be814ae-0e3d-4631-8a4c-348ead46c213 - - - - -] Triggering sync for uuid b28cc3ae-6442-40cf-9d66-9d4938a567c7 _sync_power_states /usr/lib/python3.6/site-packages/nova/compute/man
ager.py:9695                 
2021-07-22 16:31:30.589 3 DEBUG oslo_concurrency.lockutils [-] Lock "b28cc3ae-6442-40cf-9d66-9d4938a567c7" acquired by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" ::
 waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
2021-07-22 16:31:30.746 3 INFO nova.compute.manager [-] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] During _sync_instance_power_state the DB power_state (1) does not match the vm_power_state from the hypervisor (4). Updating power_
state in the DB to match the hypervisor.                         
2021-07-22 16:31:30.930 3 WARNING nova.compute.manager [-] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state:
1, current VM power_state: 4                                       
2021-07-22 16:31:30.931 3 DEBUG nova.compute.api [-] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Going to try to stop instance force_stop /usr/lib/python3.6/site-packages/nova/compute/api.py:2584
2021-07-22 16:31:31.135 3 DEBUG oslo_concurrency.lockutils [-] Lock "b28cc3ae-6442-40cf-9d66-9d4938a567c7" released by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" ::
 held 0.547s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:371
2021-07-22 16:31:31.161 3 DEBUG oslo_concurrency.lockutils [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] Lock "b28cc3ae-6442-40cf-9d66-9d4938a567c7" acquired by "nova.compute.manager.ComputeManager.stop_instance.<locals>.do_stop_
instance" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
2021-07-22 16:31:31.162 3 DEBUG nova.compute.manager [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Checking state _get_power_state /usr/lib/python3.6/site-packages/nova/compute/man
ager.py:1561                                             
2021-07-22 16:31:31.165 3 DEBUG nova.compute.manager [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Stopping instance; current vm_state: active, current task_state: powering-off, cu
rrent DB power_state: 4, current VM power_state: 4 do_stop_instance /usr/lib/python3.6/site-packages/nova/compute/manager.py:3095
2021-07-22 16:31:31.166 3 INFO nova.compute.manager [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Instance is already powered off in the hypervisor when stop is called.
2021-07-22 16:31:31.168 3 DEBUG nova.objects.instance [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] Lazy-loading 'flavor' on Instance uuid b28cc3ae-6442-40cf-9d66-9d4938a567c7 obj_load_attr /usr/lib/python3.6/site-packages/nova/$
bjects/instance.py:1103           
2021-07-22 16:31:31.236 3 DEBUG nova.objects.instance [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] Lazy-loading 'metadata' on Instance uuid b28cc3ae-6442-40cf-9d66-9d4938a567c7 obj_load_attr /usr/lib/python3.6/site-packages/nov$
/objects/instance.py:1103                                     
2021-07-22 16:31:31.302 3 INFO nova.virt.libvirt.driver [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Instance already shutdown.
2021-07-22 16:31:31.308 3 INFO nova.virt.libvirt.driver [-] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Instance destroyed successfully.


I'll attach the full debug logs as well.

I couldn't see anything on the Controller node relevant to it. Just
seems to be the compute node, but I can capture whatever we need.

** Affects: nova
     Importance: Undecided
         Status: New

** Attachment added: "Relevant Nova and libvirt logs"
   https://bugs.launchpad.net/bugs/1937292/+attachment/5512951/+files/nova_vm_destroyed_logs.tar

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1937292

Title:
  All overcloud VM's powered off on hypervisor when nova_libvirt is
  restarted

Status in OpenStack Compute (nova):
  New

Bug description:
  Description:

  Using TripleO. Noted that all VM's on a Hypervisor are powered off
  during the overcloud deployment. (I only have one Hypervisor sorry, I
  can't tell you if it would happen to more than one hypervisor).

  Seems to happen when the nova_libvirt container is restarted.

  Environment:
  TripleO - Master
  # podman exec -it nova_libvirt rpm -qa | grep nova
  python3-nova-23.1.0-0.20210625160814.1f6c351.el8.noarch
  openstack-nova-compute-23.1.0-0.20210625160814.1f6c351.el8.noarch
  openstack-nova-common-23.1.0-0.20210625160814.1f6c351.el8.noarch
  openstack-nova-migration-23.1.0-0.20210625160814.1f6c351.el8.noarch
  python3-novaclient-17.5.0-0.20210601131008.f431295.el8.noarch

  Reproducer:
  At least for me:
  1. Start a VM
  2. Restart tripleo_nova_libvirt.service:
  systemctl restart tripleo_nova_libvirt.service
  3. All VM's are stopped

  Relevant logs:
  2021-07-22 16:31:05.532 3 DEBUG nova.compute.manager [req-19a38d0b-e019-472b-95c4-03c796040767 d2ab1d5792604ba094af82d7447e88cf c4740b2aba4147adb7f101a2782003c3 - default default] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] No wait
  ing events found dispatching network-vif-plugged-d9b29fef-cd87-41db-ba79-8b8c65b74efb pop_instance_event /usr/lib/python3.6/site-packages/nova/compute/manager.py:319
  2021-07-22 16:31:05.532 3 WARNING nova.compute.manager [req-19a38d0b-e019-472b-95c4-03c796040767 d2ab1d5792604ba094af82d7447e88cf c4740b2aba4147adb7f101a2782003c3 - default default] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Recei
  ved unexpected event network-vif-plugged-d9b29fef-cd87-41db-ba79-8b8c65b74efb for instance with vm_state active and task_state None.
  2021-07-22 16:31:30.583 3 DEBUG nova.compute.manager [req-7be814ae-0e3d-4631-8a4c-348ead46c213 - - - - -] Triggering sync for uuid b28cc3ae-6442-40cf-9d66-9d4938a567c7 _sync_power_states /usr/lib/python3.6/site-packages/nova/compute/man
  ager.py:9695                 
  2021-07-22 16:31:30.589 3 DEBUG oslo_concurrency.lockutils [-] Lock "b28cc3ae-6442-40cf-9d66-9d4938a567c7" acquired by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" ::
   waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
  2021-07-22 16:31:30.746 3 INFO nova.compute.manager [-] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] During _sync_instance_power_state the DB power_state (1) does not match the vm_power_state from the hypervisor (4). Updating power_
  state in the DB to match the hypervisor.                         
  2021-07-22 16:31:30.930 3 WARNING nova.compute.manager [-] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state:
  1, current VM power_state: 4                                       
  2021-07-22 16:31:30.931 3 DEBUG nova.compute.api [-] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Going to try to stop instance force_stop /usr/lib/python3.6/site-packages/nova/compute/api.py:2584
  2021-07-22 16:31:31.135 3 DEBUG oslo_concurrency.lockutils [-] Lock "b28cc3ae-6442-40cf-9d66-9d4938a567c7" released by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" ::
   held 0.547s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:371
  2021-07-22 16:31:31.161 3 DEBUG oslo_concurrency.lockutils [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] Lock "b28cc3ae-6442-40cf-9d66-9d4938a567c7" acquired by "nova.compute.manager.ComputeManager.stop_instance.<locals>.do_stop_
  instance" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
  2021-07-22 16:31:31.162 3 DEBUG nova.compute.manager [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Checking state _get_power_state /usr/lib/python3.6/site-packages/nova/compute/man
  ager.py:1561                                             
  2021-07-22 16:31:31.165 3 DEBUG nova.compute.manager [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Stopping instance; current vm_state: active, current task_state: powering-off, cu
  rrent DB power_state: 4, current VM power_state: 4 do_stop_instance /usr/lib/python3.6/site-packages/nova/compute/manager.py:3095
  2021-07-22 16:31:31.166 3 INFO nova.compute.manager [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Instance is already powered off in the hypervisor when stop is called.
  2021-07-22 16:31:31.168 3 DEBUG nova.objects.instance [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] Lazy-loading 'flavor' on Instance uuid b28cc3ae-6442-40cf-9d66-9d4938a567c7 obj_load_attr /usr/lib/python3.6/site-packages/nova/$
  bjects/instance.py:1103           
  2021-07-22 16:31:31.236 3 DEBUG nova.objects.instance [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] Lazy-loading 'metadata' on Instance uuid b28cc3ae-6442-40cf-9d66-9d4938a567c7 obj_load_attr /usr/lib/python3.6/site-packages/nov$
  /objects/instance.py:1103                                     
  2021-07-22 16:31:31.302 3 INFO nova.virt.libvirt.driver [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Instance already shutdown.
  2021-07-22 16:31:31.308 3 INFO nova.virt.libvirt.driver [-] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Instance destroyed successfully.

  
  I'll attach the full debug logs as well.

  I couldn't see anything on the Controller node relevant to it. Just
  seems to be the compute node, but I can capture whatever we need.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1937292/+subscriptions



Follow ups