← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1440201] Re: reboot of a nova instance appears to leave attached volumes in a bad state

 

Hi, Jay,

I confirmed this bug happened in my devstack environment using LVM/iSCSI driver.
>From my investigation of this issue, this might be bug of Libvirt.

When Nova calls detach_volume() at step (3), Libvirt tries to remove the block device information from an instance's 
xml file and then calls disconnect_volume() to logout an "iSCSI session".

In the disconnect_volume(), Nova checks all block device list from all VM's xml files at _get_all_block_devices(), and
if there are some block devices related to the "iSCSI session", Nova does not logout from the iSCSI session.

Normally, Libvirt removes the block device info from instance's  xml quickly, but in this case I found the block device
info remained at the disconnect_volume().  I suppose reboot VM via Libvirt affects some bad impact to remove
block device info. So the "iSCSI session" logout was skipped. As a result, the iSCSI session remained.

After that, if user tries to attach a volume again, Cinder creates new user and password and passes these to Nova.
Nova tries to use these new user and password but the old iSCSI session still remains, this connection is failed
and you can see many error log of iscsid in the /var/log/messages.

I think this is not only affect LVM but also other iSCSI backend
storages.

** Also affects: nova
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1440201

Title:
  reboot of a nova instance appears to leave attached volumes in a bad
  state

Status in Cinder:
  New
Status in OpenStack Compute (Nova):
  New

Bug description:
  Steps to recreate:

  1) attach a volume to a VM
  2) reboot VM
  3) detach the volume from VM,  right after the volume status changes from "In-use" to "available", attach the volume again,  upper error occurs and the volume can't be attached to the VM any more.  BTW, in step3, detach the volume from VM,   wait 3 minutes or so,  attach the volume again,  can be attached successfully.

  This is being seen in Kilo using LVM/iSCSI.

  Looking at the logs it appears that something is happening during the
  reboot that leave the iscsi volume in a bad state.  Right after the
  reboot I see:

  2015-04-03 14:42:33.319 19415 INFO nova.compute.manager [req-4995e5ea-c845-4b5b-a6d4-8a214e0ea87f 157e5fddcca340a2a9ec6cd5a1216b4f df004f3072784994870e46ee997ea70f - - -] [instance: 0235ba32-323d-4f59-91ea-2f2f7ef2bd04] Detach volume 3cc769bb-d585-4812-a8fd-2888fedda58d from mountpoint /dev/vdb
  2015-04-03 14:42:56.268 19415 INFO nova.compute.manager [req-0037930a-9d20-4b8c-a150-74b0b4411530 157e5fddcca340a2a9ec6cd5a1216b4f df004f3072784994870e46ee997ea70f - - -] [instance: 0235ba32-323d-4f59-91ea-2f2f7ef2bd04] Attaching volume 3cc769bb-d585-4812-a8fd-2888fedda58d to /dev/vdb
  2015-04-03 14:43:01.647 19415 ERROR nova.virt.libvirt.driver [req-0037930a-9d20-4b8c-a150-74b0b4411530 157e5fddcca340a2a9ec6cd5a1216b4f df004f3072784994870e46ee997ea70f - - -] [instance: 0235ba32-323d-4f59-91ea-2f2f7ef2bd04] Failed to attach volume at mountpoint: /dev/vdb

  After this I start seeing the following in the logs:

  2015-04-03 14:43:03.157 19415 INFO nova.scheduler.client.report [req-0037930a-9d20-4b8c-a150-74b0b4411530 157e5fddcca340a2a9ec6cd5a1216b4f df004f3072784994870e46ee997ea70f - - -] Compute_service record updated for ('devo-n02-kvm.rch.kstart.ibm.com', 'devo-n02-kvm.rch.kstart.ibm.com')
  2015-04-03 14:43:03.638 19415 ERROR oslo_messaging.rpc.dispatcher [req-0037930a-9d20-4b8c-a150-74b0b4411530 157e5fddcca340a2a9ec6cd5a1216b4f df004f3072784994870e46ee997ea70f - - -] Exception during message handling: The supplied device (vdb) is busy.
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     executor_callback))
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     executor_callback)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     result = func(ctxt, **new_args)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 430, in decorated_function
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/exception.py", line 88, in wrapped
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     payload)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 82, in __exit__
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/exception.py", line 71, in wrapped
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     return f(self, context, *args, **kw)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 314, in decorated_function
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     LOG.warning(msg, e, instance_uuid=instance_uuid)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 82, in __exit__
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 290, in decorated_function
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 342, in decorated_function
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     kwargs['instance'], e, sys.exc_info())
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 82, in __exit__
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 330, in decorated_function
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4719, in attach_volume
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     do_attach_volume(context, instance, driver_bdm)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 431, in inner
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     return f(*args, **kwargs)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 82, in __exit__
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4714, in do_attach_volume
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     return self._attach_volume(context, instance, driver_bdm)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4737, in _attach_volume
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     self.volume_api.unreserve_volume(context, bdm.volume_id)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 82, in __exit__
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4729, in _attach_volume
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     do_check_attach=False, do_driver_attach=True)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 48, in wrapped
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     ret_val = method(obj, context, *args, **kwargs)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 258, in attach
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     connector)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 82, in __exit__
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 249, in attach
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     device_type=self['device_type'], encryption=encryption)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1062, in attach_volume
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher     raise exception.DeviceIsBusy(device=disk_dev)
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher DeviceIsBusy: The supplied device (vdb) is busy.
  2015-04-03 14:43:03.638 19415 TRACE oslo_messaging.rpc.dispatcher

  
  After this the update of resources  start failing:

  2015-04-03 14:45:02.507 19415 ERROR nova.openstack.common.periodic_task [req-fe14e38c-8f29-46b0-884b-3b24f7ff0397 - - - - -] Error during ComputeManager.update_available_resource: Unexpected error while running command.
  Command: sudo nova-rootwrap /etc/nova/rootwrap.conf blockdev --getsize64 /dev/disk/by-path/ip-9.114.170.202:3260-iscsi-iqn.2010-10.org.openstack:volume-3cc769bb-d585-4812-a8fd-2888fedda58d-lun-0
  Exit code: 1
  Stdout: u''
  Stderr: u'blockdev: cannot open /dev/disk/by-path/ip-9.114.170.202:3260-iscsi-iqn.2010-10.org.openstack:volume-3cc769bb-d585-4812-a8fd-2888fedda58d-lun-0: No such device or address\n'
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/openstack/common/periodic_task.py", line 224, in run_periodic_tasks
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task     task(self, context)
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6166, in update_available_resource
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task     rt.update_available_resource(context)
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 317, in update_available_resource
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task     resources = self.driver.get_available_resource(self.nodename)
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4872, in get_available_resource
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task     disk_over_committed = self._get_disk_over_committed_size_total()
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6054, in _get_disk_over_committed_size_total
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task     self._get_instance_disk_info(dom.name(), xml))
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6007, in _get_instance_disk_info
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task     dk_size = lvm.get_volume_size(path)
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/lvm.py", line 172, in get_volume_size
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task     run_as_root=True)
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/utils.py", line 55, in execute
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task     return utils.execute(*args, **kwargs)
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/utils.py", line 206, in execute
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task     return processutils.execute(*cmd, **kwargs)
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 224, in execute
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task     cmd=sanitized_cmd)
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task ProcessExecutionError: Unexpected error while running command.
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task Command: sudo nova-rootwrap /etc/nova/rootwrap.conf blockdev --getsize64 /dev/disk/by-path/ip-9.114.170.202:3260-iscsi-iqn.2010-10.org.openstack:volume-3cc769bb-d585-4812-a8fd-2888fedda58d-lun-0
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task Exit code: 1
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task Stdout: u''
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task Stderr: u'blockdev: cannot open /dev/disk/by-path/ip-9.114.170.202:3260-iscsi-iqn.2010-10.org.openstack:volume-3cc769bb-d585-4812-a8fd-2888fedda58d-lun-0: No such device or address\n'
  2015-04-03 14:45:02.507 19415 TRACE nova.openstack.common.periodic_task

  The file under /dev/disk/by-path/ exists but but you get the 'No such
  device or address' error when attempting to access it.

  In /var/log/messages I see this:

  Apr  3 14:43:00 devo-n02-kvm kernel: [731664.038599]  connection8:0: detected conn error (1020)
  Apr  3 14:43:01 devo-n02-kvm iscsid: connection18:0 is operational after recovery (7 attempts)
  Apr  3 14:43:01 devo-n02-kvm iscsid: conn 0 login rejected: initiator failed authorization with target
  Apr  3 14:43:01 devo-n02-kvm libvirtd[61554]: operation failed: target vdb already exists
  Apr  3 14:43:02 devo-n02-kvm kernel: [731665.358832]  connection18:0: detected conn error (1020)
  Apr  3 14:43:03 devo-n02-kvm iscsid: Kernel reported iSCSI connection 18:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
  Apr  3 14:43:04 devo-n02-kvm iscsid: conn 0 login rejected: initiator failed authorization with target
  Apr  3 14:43:04 devo-n02-kvm kernel: [731667.948507]  connection18:0: detected conn error (1020)

  I am wondering if this might be related to this bug:
  https://bugs.dogfood.paddev.net/nova/+bug/1367189  Are the chap
  credentials lost with the reboot or something like that?

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1440201/+subscriptions