yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #31266
[Bug 1440201] Re: reboot of a nova instance appears to leave attached volumes in a bad state
Hi, Jay,
I confirmed this bug happened in my devstack environment using LVM/iSCSI driver.
>From my investigation of this issue, this might be bug of Libvirt.
When Nova calls detach_volume() at step (3), Libvirt tries to remove the block device information from an instance's
xml file and then calls disconnect_volume() to logout an "iSCSI session".
In the disconnect_volume(), Nova checks all block device list from all VM's xml files at _get_all_block_devices(), and
if there are some block devices related to the "iSCSI session", Nova does not logout from the iSCSI session.
Normally, Libvirt removes the block device info from instance's xml quickly, but in this case I found the block device
info remained at the disconnect_volume(). I suppose reboot VM via Libvirt affects some bad impact to remove
block device info. So the "iSCSI session" logout was skipped. As a result, the iSCSI session remained.
After that, if user tries to attach a volume again, Cinder creates new user and password and passes these to Nova.
Nova tries to use these new user and password but the old iSCSI session still remains, this connection is failed
and you can see many error log of iscsid in the /var/log/messages.
I think this is not only affect LVM but also other iSCSI backend
storages.
** Also affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1440201
Title:
reboot of a nova instance appears to leave attached volumes in a bad
state
Status in Cinder:
New
Status in OpenStack Compute (Nova):
New
Bug description:
Steps to recreate:
1) attach a volume to a VM
2) reboot VM
3) detach the volume from VM, right after the volume status changes from "In-use" to "available", attach the volume again, upper error occurs and the volume can't be attached to the VM any more. BTW, in step3, detach the volume from VM, wait 3 minutes or so, attach the volume again, can be attached successfully.
This is being seen in Kilo using LVM/iSCSI.
Looking at the logs it appears that something is happening during the
reboot that leave the iscsi volume in a bad state. Right after the
reboot I see:
2015-04-03 14:42:33.319 19415 INFO nova.compute.manager [req-4995e5ea-c845-4b5b-a6d4-8a214e0ea87f 157e5fddcca340a2a9ec6cd5a1216b4f df004f3072784994870e46ee997ea70f - - -] [instance: 0235ba32-323d-4f59-91ea-2f2f7ef2bd04] Detach volume 3cc769bb-d585-4812-a8fd-2888fedda58d from mountpoint /dev/vdb
2015-04-03 14:42:56.268 19415 INFO nova.compute.manager [req-0037930a-9d20-4b8c-a150-74b0b4411530 157e5fddcca340a2a9ec6cd5a1216b4f df004f3072784994870e46ee997ea70f - - -] [instance: 0235ba32-323d-4f59-91ea-2f2f7ef2bd04] Attaching volume 3cc769bb-d585-4812-a8fd-2888fedda58d to /dev/vdb
2015-04-03 14:43:01.647 19415 ERROR nova.virt.libvirt.driver [req-0037930a-9d20-4b8c-a150-74b0b4411530 157e5fddcca340a2a9ec6cd5a1216b4f df004f3072784994870e46ee997ea70f - - -] [instance: 0235ba32-323d-4f59-91ea-2f2f7ef2bd04] Failed to attach volume at mountpoint: /dev/vdb
After this I start seeing the following in the logs:
2015-04-03 14:43:03.157 19415 INFO nova.scheduler.client.report [req-0037930a-9d20-4b8c-a150-74b0b4411530 157e5fddcca340a2a9ec6cd5a1216b4f df004f3072784994870e46ee997ea70f - - -] Compute_service record updated for ('devo-n02-kvm.rch.kstart.ibm.com', 'devo-n02-kvm.rch.kstart.ibm.com')
2015-04-03 14:43:03.638 19415 ERROR oslo_messaging.rpc.dispatcher [req-0037930a-9d20-4b8c-a150-74b0b4411530 157e5fddcca340a2a9ec6cd5a1216b4f df004f3072784994870e46ee997ea70f - - -] Exception during message handling: The supplied device (vdb) is busy.
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher executor_callback))
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher executor_callback)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 430, in decorated_function
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 88, in wrapped
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher payload)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 82, in __exit__
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 71, in wrapped
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher return f(self, context, *args, **kw)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 314, in decorated_function
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher LOG.warning(msg, e, instance_uuid=instance_uuid)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 82, in __exit__
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 290, in decorated_function
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 342, in decorated_function
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher kwargs['instance'], e, sys.exc_info())
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 82, in __exit__
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 330, in decorated_function
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4719, in attach_volume
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher do_attach_volume(context, instance, driver_bdm)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 431, in inner
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher return f(*args, **kwargs)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 82, in __exit__
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4714, in do_attach_volume
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher return self._attach_volume(context, instance, driver_bdm)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4737, in _attach_volume
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher self.volume_api.unreserve_volume(context, bdm.volume_id)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 82, in __exit__
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4729, in _attach_volume
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher do_check_attach=False, do_driver_attach=True)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 48, in wrapped
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher ret_val = method(obj, context, *args, **kwargs)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 258, in attach
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher connector)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 82, in __exit__
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 249, in attach
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher device_type=self['device_type'], encryption=encryption)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1062, in attach_volume
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher raise exception.DeviceIsBusy(device=disk_dev)
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher DeviceIsBusy: The supplied device (vdb) is busy.
2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher
After this the update of resources start failing:
2015-04-03 14:45:02.507 19415 ERROR nova.openstack.common.periodic_task [req-fe14e38c-8f29-46b0-884b-3b24f7ff0397 - - - - -] Error during ComputeManager.update_available_resource: Unexpected error while running command.
Command: sudo nova-rootwrap /etc/nova/rootwrap.conf blockdev --getsize64 /dev/disk/by-path/ip-9.114.170.202:3260-iscsi-iqn.2010-10.org.openstack:volume-3cc769bb-d585-4812-a8fd-2888fedda58d-lun-0
Exit code: 1
Stdout: u''
Stderr: u'blockdev: cannot open /dev/disk/by-path/ip-9.114.170.202:3260-iscsi-iqn.2010-10.org.openstack:volume-3cc769bb-d585-4812-a8fd-2888fedda58d-lun-0: No such device or address\n'
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/openstack/common/periodic_task.py", line 224, in run_periodic_tasks
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task task(self, context)
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6166, in update_available_resource
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task rt.update_available_resource(context)
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 317, in update_available_resource
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task resources = self.driver.get_available_resource(self.nodename)
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4872, in get_available_resource
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task disk_over_committed = self._get_disk_over_committed_size_total()
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6054, in _get_disk_over_committed_size_total
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task self._get_instance_disk_info(dom.name(), xml))
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6007, in _get_instance_disk_info
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task dk_size = lvm.get_volume_size(path)
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/lvm.py", line 172, in get_volume_size
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task run_as_root=True)
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/utils.py", line 55, in execute
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task return utils.execute(*args, **kwargs)
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/utils.py", line 206, in execute
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task return processutils.execute(*cmd, **kwargs)
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 224, in execute
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task cmd=sanitized_cmd)
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task ProcessExecutionError: Unexpected error while running command.
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task Command: sudo nova-rootwrap /etc/nova/rootwrap.conf blockdev --getsize64 /dev/disk/by-path/ip-9.114.170.202:3260-iscsi-iqn.2010-10.org.openstack:volume-3cc769bb-d585-4812-a8fd-2888fedda58d-lun-0
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task Exit code: 1
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task Stdout: u''
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task Stderr: u'blockdev: cannot open /dev/disk/by-path/ip-9.114.170.202:3260-iscsi-iqn.2010-10.org.openstack:volume-3cc769bb-d585-4812-a8fd-2888fedda58d-lun-0: No such device or address\n'
2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task
The file under /dev/disk/by-path/ exists but but you get the 'No such
device or address' error when attempting to access it.
In /var/log/messages I see this:
Apr 3 14:43:00 devo-n02-kvm kernel: [731664.038599] connection8:0: detected conn error (1020)
Apr 3 14:43:01 devo-n02-kvm iscsid: connection18:0 is operational after recovery (7 attempts)
Apr 3 14:43:01 devo-n02-kvm iscsid: conn 0 login rejected: initiator failed authorization with target
Apr 3 14:43:01 devo-n02-kvm libvirtd[61554]: operation failed: target vdb already exists
Apr 3 14:43:02 devo-n02-kvm kernel: [731665.358832] connection18:0: detected conn error (1020)
Apr 3 14:43:03 devo-n02-kvm iscsid: Kernel reported iSCSI connection 18:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Apr 3 14:43:04 devo-n02-kvm iscsid: conn 0 login rejected: initiator failed authorization with target
Apr 3 14:43:04 devo-n02-kvm kernel: [731667.948507] connection18:0: detected conn error (1020)
I am wondering if this might be related to this bug:
https://bugs.dogfood.paddev.net/nova/+bug/1367189 Are the chap
credentials lost with the reboot or something like that?
To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1440201/+subscriptions