← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2046332] [NEW] When the host is restarted, nova-compute startup gets stuck.

 

Public bug reported:

openstack version: Wallaby
libvirt: 6.0.0
qemu: 4.2.0
backend storage: iscsi

The instance uses iscsi commercial storage

nova-compute.log
2023-12-04 14:13:43.582 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.582449438+08:00 stdout F 2023-12-04 14:13:43.582 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Scanning host 9 c: -, t: -, l: 10)^[[00m
2023-12-04 14:13:43.664 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.66419754+08:00 stdout F 2023-12-04 14:13:43.663 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 2 and hctl ('3', '0', '0', 10) yield: sdn^[[00m
2023-12-04 14:13:43.668 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.668415615+08:00 stdout F 2023-12-04 14:13:43.668 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 3 and hctl ['4', '*', '*', 10] yield: sdo^[[00m
2023-12-04 14:13:43.741 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.74138962+08:00 stdout F 2023-12-04 14:13:43.740 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 1 and hctl ('2', '0', '0', 10) yield: sdp^[[00m
2023-12-04 14:13:43.743 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.743081142+08:00 stdout F 2023-12-04 14:13:43.742 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 6 and hctl ['7', '*', '*', 10] yield: sdq^[[00m
2023-12-04 14:13:43.744 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.744370001+08:00 stdout F 2023-12-04 14:13:43.744 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 8 and hctl ['9', '*', '*', 10] yield: sdr^[[00m
2023-12-04 14:13:43.744 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.744783448+08:00 stdout F 2023-12-04 14:13:43.744 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 5 and hctl ('6', '0', '0', 10) yield: sdt^[[00m
2023-12-04 14:13:43.745 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.745437018+08:00 stdout F 2023-12-04 14:13:43.745 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 7 and hctl ['8', '*', '*', 10] yield: sds^[[00m
2023-12-04 14:13:44.226 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:44.226460766+08:00 stdout F 2023-12-04 14:13:44.225 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 4 and hctl ['5', '*', '*', 10] yield: sdu^[[00m
2023-12-04 14:13:44.482 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:44.482754517+08:00 stdout F 2023-12-04 14:13:44.482 32029 INFO os_brick.initiator.connectors.iscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] All connection threads finished, giving 10 seconds for dm to appear.^[[00m
2023-12-04 14:13:45.484 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:45.484445914+08:00 stdout F 2023-12-04 14:13:45.483 32029 INFO os_brick.initiator.connectors.iscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Result for connection is {'type': 'block', 'scsi_wwn': '36001405acbd43e79b732d343028329ed', 'path': '/dev/dm-3', 'multipath_id': '36001405acbd43e79b732d343028329ed'}.^[[00m
2023-12-04 14:13:45.514 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:45.514589463+08:00 stdout F 2023-12-04 14:13:45.514 32029 INFO os_vif [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Successfully plugged vif VIFOpenVSwitch(active=True,address=fa:16:3e:e2:3c:03,bridge_name='br-int',has_traffic_filtering=True,id=ff32db4b-0a4a-4f89-bc17-eb261d55ef67,network=Network(d8955418-a63d-4f22-86cc-037e0cd7a291),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tapff32db4b-0a')^[[00m
2023-12-04 15:09:28.512 +0800 ¦ node-12 ¦ nova-compute-g7pf8 ¦ nova-compute ¦ 2023-12-04T15:09:28.51286769+08:00 stderr F ++ hostname -s
2023-12-04 15:09:28.513 +0800 ¦ node-12 ¦ nova-compute-g7pf8 ¦ nova-compute ¦ 2023-12-04T15:09:28.5139163+08:00 stderr F + host_k8s=node-12
2023-12-04 15:09:28.514 +0800 ¦ node-12 ¦ nova-compute-g7pf8 ¦ nova-compute ¦ 2023-12-04T15:09:28.514304455+08:00 stderr F + grep hostha=enabled

libvirt.log
2023-12-04 14:13:35.707 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:13:35.707262365+08:00 stderr F 2023-12-04 06:13:35.707+0000: 14541: info : hostname: node-12.domain.tld
2023-12-04 14:13:35.707 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:13:35.707265536+08:00 stderr F 2023-12-04 06:13:35.707+0000: 14541: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa CPU count; partial NUMA mapping is obsoleted and will be removed in future
2023-12-04 14:13:35.709 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:13:35.709217034+08:00 stderr F 2023-12-04 06:13:35.709+0000: 14540: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa CPU count; partial NUMA mapping is obsoleted and will be removed in future
2023-12-04 14:13:45.528 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:13:45.528828631+08:00 stderr F 2023-12-04 06:13:45.528+0000: 14543: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa CPU count; partial NUMA mapping is obsoleted and will be removed in future
2023-12-04 14:13:45.531 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:13:45.531482963+08:00 stderr F 2023-12-04 06:13:45.531+0000: 14544: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa CPU count; partial NUMA mapping is obsoleted and will be removed in future
2023-12-04 14:18:44.472 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:18:44.472148283+08:00 stderr F 2023-12-04 06:18:44.472+0000: 14543: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (298s, 0s, 298s)
2023-12-04 14:18:44.472 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:18:44.472176398+08:00 stderr F 2023-12-04 06:18:44.472+0000: 14543: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags)
2023-12-04 14:19:14.490 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:19:14.490144202+08:00 stderr F 2023-12-04 06:19:14.490+0000: 14540: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (328s, 0s, 328s)
2023-12-04 14:19:14.490 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:19:14.490163367+08:00 stderr F 2023-12-04 06:19:14.490+0000: 14540: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags)
2023-12-04 14:19:44.542 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:19:44.542135682+08:00 stderr F 2023-12-04 06:19:44.542+0000: 14542: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (358s, 0s, 359s)
2023-12-04 14:19:44.542 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:19:44.542158518+08:00 stderr F 2023-12-04 06:19:44.542+0000: 14542: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags)
2023-12-04 14:20:14.590 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:20:14.590167336+08:00 stderr F 2023-12-04 06:20:14.590+0000: 14541: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (388s, 0s, 389s)
2023-12-04 14:20:14.590 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:20:14.590188481+08:00 stderr F 2023-12-04 06:20:14.590+0000: 14541: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags)
2023-12-04 14:20:44.628 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:20:44.62812621+08:00 stderr F 2023-12-04 06:20:44.628+0000: 14543: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (419s, 0s, 419s)
2023-12-04 14:20:44.628 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:20:44.628149564+08:00 stderr F 2023-12-04 06:20:44.628+0000: 14543: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags)
2023-12-04 15:09:01.199 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T15:09:01.199162919+08:00 stderr F 2023-12-04 07:09:01.199+0000: 14543: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (3315s, 0s, 3315s)
......
2023-12-04 15:09:01.199 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T15:09:01.199191359+08:00 stderr F 2023-12-04 07:09:01.199+0000: 14543: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags)


The nova-compute service stays down because the service did not finish starting up

When nova-compute starts up, it will reboot the active VM, and call defineXML -> createWithFlags ->lookupByUUIDString in order, until the VM is active, then it will log "Instance rebooted successfully".
Because the createWithFlags call keeps blocking, nova-compute is stuck.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2046332

Title:
  When the host is restarted, nova-compute startup gets stuck.

Status in OpenStack Compute (nova):
  New

Bug description:
  openstack version: Wallaby
  libvirt: 6.0.0
  qemu: 4.2.0
  backend storage: iscsi

  The instance uses iscsi commercial storage

  nova-compute.log
  2023-12-04 14:13:43.582 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.582449438+08:00 stdout F 2023-12-04 14:13:43.582 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Scanning host 9 c: -, t: -, l: 10)^[[00m
  2023-12-04 14:13:43.664 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.66419754+08:00 stdout F 2023-12-04 14:13:43.663 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 2 and hctl ('3', '0', '0', 10) yield: sdn^[[00m
  2023-12-04 14:13:43.668 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.668415615+08:00 stdout F 2023-12-04 14:13:43.668 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 3 and hctl ['4', '*', '*', 10] yield: sdo^[[00m
  2023-12-04 14:13:43.741 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.74138962+08:00 stdout F 2023-12-04 14:13:43.740 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 1 and hctl ('2', '0', '0', 10) yield: sdp^[[00m
  2023-12-04 14:13:43.743 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.743081142+08:00 stdout F 2023-12-04 14:13:43.742 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 6 and hctl ['7', '*', '*', 10] yield: sdq^[[00m
  2023-12-04 14:13:43.744 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.744370001+08:00 stdout F 2023-12-04 14:13:43.744 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 8 and hctl ['9', '*', '*', 10] yield: sdr^[[00m
  2023-12-04 14:13:43.744 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.744783448+08:00 stdout F 2023-12-04 14:13:43.744 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 5 and hctl ('6', '0', '0', 10) yield: sdt^[[00m
  2023-12-04 14:13:43.745 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.745437018+08:00 stdout F 2023-12-04 14:13:43.745 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 7 and hctl ['8', '*', '*', 10] yield: sds^[[00m
  2023-12-04 14:13:44.226 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:44.226460766+08:00 stdout F 2023-12-04 14:13:44.225 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 4 and hctl ['5', '*', '*', 10] yield: sdu^[[00m
  2023-12-04 14:13:44.482 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:44.482754517+08:00 stdout F 2023-12-04 14:13:44.482 32029 INFO os_brick.initiator.connectors.iscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] All connection threads finished, giving 10 seconds for dm to appear.^[[00m
  2023-12-04 14:13:45.484 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:45.484445914+08:00 stdout F 2023-12-04 14:13:45.483 32029 INFO os_brick.initiator.connectors.iscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Result for connection is {'type': 'block', 'scsi_wwn': '36001405acbd43e79b732d343028329ed', 'path': '/dev/dm-3', 'multipath_id': '36001405acbd43e79b732d343028329ed'}.^[[00m
  2023-12-04 14:13:45.514 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:45.514589463+08:00 stdout F 2023-12-04 14:13:45.514 32029 INFO os_vif [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Successfully plugged vif VIFOpenVSwitch(active=True,address=fa:16:3e:e2:3c:03,bridge_name='br-int',has_traffic_filtering=True,id=ff32db4b-0a4a-4f89-bc17-eb261d55ef67,network=Network(d8955418-a63d-4f22-86cc-037e0cd7a291),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tapff32db4b-0a')^[[00m
  2023-12-04 15:09:28.512 +0800 ¦ node-12 ¦ nova-compute-g7pf8 ¦ nova-compute ¦ 2023-12-04T15:09:28.51286769+08:00 stderr F ++ hostname -s
  2023-12-04 15:09:28.513 +0800 ¦ node-12 ¦ nova-compute-g7pf8 ¦ nova-compute ¦ 2023-12-04T15:09:28.5139163+08:00 stderr F + host_k8s=node-12
  2023-12-04 15:09:28.514 +0800 ¦ node-12 ¦ nova-compute-g7pf8 ¦ nova-compute ¦ 2023-12-04T15:09:28.514304455+08:00 stderr F + grep hostha=enabled

  libvirt.log
  2023-12-04 14:13:35.707 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:13:35.707262365+08:00 stderr F 2023-12-04 06:13:35.707+0000: 14541: info : hostname: node-12.domain.tld
  2023-12-04 14:13:35.707 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:13:35.707265536+08:00 stderr F 2023-12-04 06:13:35.707+0000: 14541: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa CPU count; partial NUMA mapping is obsoleted and will be removed in future
  2023-12-04 14:13:35.709 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:13:35.709217034+08:00 stderr F 2023-12-04 06:13:35.709+0000: 14540: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa CPU count; partial NUMA mapping is obsoleted and will be removed in future
  2023-12-04 14:13:45.528 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:13:45.528828631+08:00 stderr F 2023-12-04 06:13:45.528+0000: 14543: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa CPU count; partial NUMA mapping is obsoleted and will be removed in future
  2023-12-04 14:13:45.531 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:13:45.531482963+08:00 stderr F 2023-12-04 06:13:45.531+0000: 14544: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa CPU count; partial NUMA mapping is obsoleted and will be removed in future
  2023-12-04 14:18:44.472 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:18:44.472148283+08:00 stderr F 2023-12-04 06:18:44.472+0000: 14543: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (298s, 0s, 298s)
  2023-12-04 14:18:44.472 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:18:44.472176398+08:00 stderr F 2023-12-04 06:18:44.472+0000: 14543: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags)
  2023-12-04 14:19:14.490 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:19:14.490144202+08:00 stderr F 2023-12-04 06:19:14.490+0000: 14540: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (328s, 0s, 328s)
  2023-12-04 14:19:14.490 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:19:14.490163367+08:00 stderr F 2023-12-04 06:19:14.490+0000: 14540: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags)
  2023-12-04 14:19:44.542 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:19:44.542135682+08:00 stderr F 2023-12-04 06:19:44.542+0000: 14542: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (358s, 0s, 359s)
  2023-12-04 14:19:44.542 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:19:44.542158518+08:00 stderr F 2023-12-04 06:19:44.542+0000: 14542: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags)
  2023-12-04 14:20:14.590 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:20:14.590167336+08:00 stderr F 2023-12-04 06:20:14.590+0000: 14541: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (388s, 0s, 389s)
  2023-12-04 14:20:14.590 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:20:14.590188481+08:00 stderr F 2023-12-04 06:20:14.590+0000: 14541: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags)
  2023-12-04 14:20:44.628 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:20:44.62812621+08:00 stderr F 2023-12-04 06:20:44.628+0000: 14543: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (419s, 0s, 419s)
  2023-12-04 14:20:44.628 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:20:44.628149564+08:00 stderr F 2023-12-04 06:20:44.628+0000: 14543: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags)
  2023-12-04 15:09:01.199 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T15:09:01.199162919+08:00 stderr F 2023-12-04 07:09:01.199+0000: 14543: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (3315s, 0s, 3315s)
  ......
  2023-12-04 15:09:01.199 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T15:09:01.199191359+08:00 stderr F 2023-12-04 07:09:01.199+0000: 14543: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags)

  
  The nova-compute service stays down because the service did not finish starting up

  When nova-compute starts up, it will reboot the active VM, and call defineXML -> createWithFlags ->lookupByUUIDString in order, until the VM is active, then it will log "Instance rebooted successfully".
  Because the createWithFlags call keeps blocking, nova-compute is stuck.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2046332/+subscriptions



Follow ups