yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #86157
[Bug 1929710] [NEW] virDomainGetBlockJobInfo fails during swap_volume as disk '$disk' not found in domain
Public bug reported:
Description
===========
The error handling around swap_volume is missing the following failure
when calling virDomainGetBlockJobInfo() after the entire device is
detached by QEMU (?) after it encounters a job during the block copy job
that at first pauses and then somehow resumes the job:
https://8a5fc27780098c5ee1bc-
3ac81d180a9c011938b2cbb0293272f3.ssl.cf5.rackcdn.com/790660/5/gate/nova-
next/e915ed4/controller/logs/screen-n-cpu.txt
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver [None req-7cfcd661-29d4-4cc3-bc54-db0e7fed1a6e tempest-TestVolumeSwap-1841575704 tempest-TestVolumeSwap-1841575704-project-admin] Failure rebasing volume /dev/sdb on vdb.: libvirt.libvirtError: invalid argument: disk 'vdb' not found in domain
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver Traceback (most recent call last):
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2107, in _swap_volume
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver while not dev.is_job_complete():
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 800, in is_job_complete
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver status = self.get_job_info()
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 707, in get_job_info
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver status = self._guest._domain.blockJobInfo(self._disk, flags=0)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 190, in doit
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver result = proxy_call(self._autowrap, f, *args, **kwargs)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 148, in proxy_call
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver rv = execute(f, *args, **kwargs)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 129, in execute
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver six.reraise(c, e, tb)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver raise value
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 83, in tworker
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver rv = meth(*args, **kwargs)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/libvirt.py", line 985, in blockJobInfo
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver raise libvirtError('virDomainGetBlockJobInfo() failed')
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver libvirt.libvirtError: invalid argument: disk 'vdb' not found in domain
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver
https://zuul.opendev.org/t/openstack/build/e915ed4aeb9346bba83910bd79e9502b/log/controller/logs/libvirt/libvirtd_log.txt
2021-05-26 09:49:40.189+0000: 79419: info : qemuMonitorSend:993 :
QEMU_MONITOR_SEND_MSG: mon=0x7fc4bc07e7d0 msg={"execute":"blockdev-
add","arguments":{"node-name":"libvirt-4-format","read-
only":false,"cache":{"direct":true,"no-
flush":false},"driver":"raw","file":"libvirt-4-storage"},"id":"libvirt-375"}^M
2021-05-26 09:49:46.154+0000: 79422: info : qemuMonitorSend:993 :
QEMU_MONITOR_SEND_MSG: mon=0x7fc4bc07e7d0 msg={"execute":"blockdev-
add","arguments":{"node-name":"libvirt-5-format","read-
only":false,"cache":{"direct":true,"no-
flush":false},"driver":"raw","file":"libvirt-5-storage"},"id":"libvirt-379"}^M
2021-05-26 09:49:46.165+0000: 79422: debug :
qemuMonitorBlockdevMirror:3112 : jobname=copy-vdb-libvirt-4-format,
persistjob=1, device=libvirt-4-format, target=libvirt-5-format,
bandwidth=0, granularity=0, buf_size=0, shallow=0
2021-05-26 09:49:46.167+0000: 79417: debug :
qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-
libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state
changed to 'created'(1)
2021-05-26 09:49:46.167+0000: 79417: debug :
qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-
libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state
changed to 'running'(2)
2021-05-26 09:49:46.763+0000: 79417: debug :
qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-
libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state
changed to 'paused'(3)
2021-05-26 09:49:46.763+0000: 79417: debug :
qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-
libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state
changed to 'running'(2)
2021-05-26 09:49:46.841+0000: 79417: debug :
qemuProcessHandleDeviceDeleted:1362 : Device virtio-disk1 removed from
domain 0x7fc4b416b0e0 instance-0000000b
2021-05-26 09:49:47.457+0000: 79417: debug :
qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-
libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state
changed to 'aborting'(8)
2021-05-26 09:49:47.458+0000: 79417: debug :
qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-
libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state
changed to 'concluded'(9)
2021-05-26 09:49:47.459+0000: 79417: debug :
qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-
libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state
changed to 'null'(11)
Steps to reproduce
==================
$ cat queries/virDomainGetBlockJobInfo.yaml
query: >
message:"virDomainGetBlockJobInfo() failed" AND
tags:"screen-n-cpu.txt"
$ elastic-recheck-query queries/virDomainGetBlockJobInfo.yaml
total hits: 6
build_branch
100% master
build_change
50% 786588
50% 792322
build_hostids
50% 1b47a855be51bba01ac6d5e6fdc4859bc17ebe2c8faaeb83392f8ff3 79fb0487675c0137b7ac30f24b5de71c70afb836e46746de770fa0c0
50% 33381c047c348ffefebf6b10cb7f0473c2359757d0bf11cc101eec54 33381c047c348ffefebf6b10cb7f0473c2359757d0bf11cc101eec54
build_name
100% nova-next
build_node
100% ubuntu-focal
build_queue
100% check
build_status
100% FAILURE
build_zuul_url
100% N/A
filename
100% controller/logs/screen-n-cpu.txt
log_url
50% https://89bc735e8a094e3d60b7-4f6db7cd5400cfa66e1c80fde6bd4076.ssl.cf1.rackcdn.com/792322/1/check/nova-next/de697b4/controller/logs/screen-n-cpu.txt
50% https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_035/786588/6/check/nova-next/0357703/controller/logs/screen-n-cpu.txt
loglevel
100% ERROR
module
33% nova.compute.manager
33% nova.virt.libvirt.driver
33% oslo_messaging.rpc.server
node_provider
50% ovh-bhs1
50% rax-iad
port
50% 48014
50% 58238
project
100% openstack/nova
syslog_pid
50% 107528
50% 108261
syslog_program
50% ubuntu-focal-ovh-bhs1-0024748800 nova-compute
50% ubuntu-focal-rax-iad-0024745546 nova-compute
tags
100% screen-n-cpu.txt screen oslofmt
voting
100% 1
zuul_attempts
100% 1
zuul_executor
50% ze01.opendev.org
50% ze07.opendev.org
Expected result
===============
swap_volume at least fails correctly leaving the original device attached.
Actual result
=============
swap_volume fails and the original device appears detached from the device.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/
master
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
libvirt + QEMU (no KVM in the gate)
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
images_type=default=qcow2
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Logs & Configs
==============
** Affects: nova
Importance: Undecided
Status: New
** Tags: gate-failure
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1929710
Title:
virDomainGetBlockJobInfo fails during swap_volume as disk '$disk' not
found in domain
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
The error handling around swap_volume is missing the following failure
when calling virDomainGetBlockJobInfo() after the entire device is
detached by QEMU (?) after it encounters a job during the block copy
job that at first pauses and then somehow resumes the job:
https://8a5fc27780098c5ee1bc-
3ac81d180a9c011938b2cbb0293272f3.ssl.cf5.rackcdn.com/790660/5/gate
/nova-next/e915ed4/controller/logs/screen-n-cpu.txt
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver [None req-7cfcd661-29d4-4cc3-bc54-db0e7fed1a6e tempest-TestVolumeSwap-1841575704 tempest-TestVolumeSwap-1841575704-project-admin] Failure rebasing volume /dev/sdb on vdb.: libvirt.libvirtError: invalid argument: disk 'vdb' not found in domain
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver Traceback (most recent call last):
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2107, in _swap_volume
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver while not dev.is_job_complete():
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 800, in is_job_complete
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver status = self.get_job_info()
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 707, in get_job_info
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver status = self._guest._domain.blockJobInfo(self._disk, flags=0)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 190, in doit
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver result = proxy_call(self._autowrap, f, *args, **kwargs)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 148, in proxy_call
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver rv = execute(f, *args, **kwargs)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 129, in execute
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver six.reraise(c, e, tb)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver raise value
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 83, in tworker
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver rv = meth(*args, **kwargs)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/libvirt.py", line 985, in blockJobInfo
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver raise libvirtError('virDomainGetBlockJobInfo() failed')
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver libvirt.libvirtError: invalid argument: disk 'vdb' not found in domain
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver
https://zuul.opendev.org/t/openstack/build/e915ed4aeb9346bba83910bd79e9502b/log/controller/logs/libvirt/libvirtd_log.txt
2021-05-26 09:49:40.189+0000: 79419: info : qemuMonitorSend:993 :
QEMU_MONITOR_SEND_MSG: mon=0x7fc4bc07e7d0 msg={"execute":"blockdev-
add","arguments":{"node-name":"libvirt-4-format","read-
only":false,"cache":{"direct":true,"no-
flush":false},"driver":"raw","file":"libvirt-4-storage"},"id":"libvirt-375"}^M
2021-05-26 09:49:46.154+0000: 79422: info : qemuMonitorSend:993 :
QEMU_MONITOR_SEND_MSG: mon=0x7fc4bc07e7d0 msg={"execute":"blockdev-
add","arguments":{"node-name":"libvirt-5-format","read-
only":false,"cache":{"direct":true,"no-
flush":false},"driver":"raw","file":"libvirt-5-storage"},"id":"libvirt-379"}^M
2021-05-26 09:49:46.165+0000: 79422: debug :
qemuMonitorBlockdevMirror:3112 : jobname=copy-vdb-libvirt-4-format,
persistjob=1, device=libvirt-4-format, target=libvirt-5-format,
bandwidth=0, granularity=0, buf_size=0, shallow=0
2021-05-26 09:49:46.167+0000: 79417: debug :
qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-
libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state
changed to 'created'(1)
2021-05-26 09:49:46.167+0000: 79417: debug :
qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-
libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state
changed to 'running'(2)
2021-05-26 09:49:46.763+0000: 79417: debug :
qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-
libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state
changed to 'paused'(3)
2021-05-26 09:49:46.763+0000: 79417: debug :
qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-
libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state
changed to 'running'(2)
2021-05-26 09:49:46.841+0000: 79417: debug :
qemuProcessHandleDeviceDeleted:1362 : Device virtio-disk1 removed from
domain 0x7fc4b416b0e0 instance-0000000b
2021-05-26 09:49:47.457+0000: 79417: debug :
qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-
libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state
changed to 'aborting'(8)
2021-05-26 09:49:47.458+0000: 79417: debug :
qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-
libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state
changed to 'concluded'(9)
2021-05-26 09:49:47.459+0000: 79417: debug :
qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-
libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state
changed to 'null'(11)
Steps to reproduce
==================
$ cat queries/virDomainGetBlockJobInfo.yaml
query: >
message:"virDomainGetBlockJobInfo() failed" AND
tags:"screen-n-cpu.txt"
$ elastic-recheck-query queries/virDomainGetBlockJobInfo.yaml
total hits: 6
build_branch
100% master
build_change
50% 786588
50% 792322
build_hostids
50% 1b47a855be51bba01ac6d5e6fdc4859bc17ebe2c8faaeb83392f8ff3 79fb0487675c0137b7ac30f24b5de71c70afb836e46746de770fa0c0
50% 33381c047c348ffefebf6b10cb7f0473c2359757d0bf11cc101eec54 33381c047c348ffefebf6b10cb7f0473c2359757d0bf11cc101eec54
build_name
100% nova-next
build_node
100% ubuntu-focal
build_queue
100% check
build_status
100% FAILURE
build_zuul_url
100% N/A
filename
100% controller/logs/screen-n-cpu.txt
log_url
50% https://89bc735e8a094e3d60b7-4f6db7cd5400cfa66e1c80fde6bd4076.ssl.cf1.rackcdn.com/792322/1/check/nova-next/de697b4/controller/logs/screen-n-cpu.txt
50% https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_035/786588/6/check/nova-next/0357703/controller/logs/screen-n-cpu.txt
loglevel
100% ERROR
module
33% nova.compute.manager
33% nova.virt.libvirt.driver
33% oslo_messaging.rpc.server
node_provider
50% ovh-bhs1
50% rax-iad
port
50% 48014
50% 58238
project
100% openstack/nova
syslog_pid
50% 107528
50% 108261
syslog_program
50% ubuntu-focal-ovh-bhs1-0024748800 nova-compute
50% ubuntu-focal-rax-iad-0024745546 nova-compute
tags
100% screen-n-cpu.txt screen oslofmt
voting
100% 1
zuul_attempts
100% 1
zuul_executor
50% ze01.opendev.org
50% ze07.opendev.org
Expected result
===============
swap_volume at least fails correctly leaving the original device attached.
Actual result
=============
swap_volume fails and the original device appears detached from the device.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/
master
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
libvirt + QEMU (no KVM in the gate)
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
images_type=default=qcow2
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Logs & Configs
==============
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1929710/+subscriptions
Follow ups