yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #68248
[Bug 1522496] Re: nova evacuate fails for instance booted from volume
** Changed in: nova
Status: Confirmed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1522496
Title:
nova evacuate fails for instance booted from volume
Status in OpenStack Compute (nova):
Fix Released
Bug description:
My simple openstack setup: total three hosts, one controller(also served as cinder volume server), and two compute node called compute1, compute2, all three hosts are running CentOS 7.1.
the cinder service is just using lvm/iscsi. openstack versions are listed at the end of this bug report.
I created a volume with source cirros image, then I launched an
instance to boot from this volume, the command is below:
nova boot --flavor m1.mytiny --block-device-mapping vda=9c077c7d-
2c94-4fc8-b5f3-22fb70bf86b4:::0 --nic net-
id=ea864ed4-5628-4392-bd79-addf36e7354b --security-group default
--key-name mykey myvm
and I checked the instance is running in compute1, then I cut all the
network connections of compute1, and wait for the "nova service-list",
"nova hypervisor-list' shows the status of compute1 is down.
Then I use the command "nova evacuate myvm compute2" to try to
evacuate the vm to compute2, however I got the following error message
in nova-compute.log on compute2:
2015-12-02 21:33:22.465 1041 INFO nova.compute.manager [req-7ddecb3e-153d-471e-985e-275f18587f87 cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] [instance: 7e22504f-45e4-4e23-ad92-378899aef112] Rebuilding instance
2015-12-02 21:33:22.492 1041 INFO nova.compute.manager [req-7ddecb3e-153d-471e-985e-275f18587f87 cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] disk not on shared storage, rebuilding from: ''
2015-12-02 21:33:22.645 1041 WARNING oslo_config.cfg [req-7ddecb3e-153d-471e-985e-275f18587f87 cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] Option "username" from group "neutron" is deprecated. Use option "user-name" from group "neutron".
2015-12-02 21:33:23.145 1041 WARNING nova.objects.instance [req-7ddecb3e-153d-471e-985e-275f18587f87 cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] [instance: 7e22504f-45e4-4e23-ad92-378899aef112] Trying to apply a migration context that does not seem to be set for this instance
2015-12-02 21:33:23.182 1041 INFO nova.compute.manager [req-7ddecb3e-153d-471e-985e-275f18587f87 cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] [instance: 7e22504f-45e4-4e23-ad92-378899aef112] Detach volume 25014c7c-a115-4a70-9636-8c69e46fd1fb from mountpoint /dev/vda
2015-12-02 21:33:23.185 1041 WARNING nova.compute.manager [req-7ddecb3e-153d-471e-985e-275f18587f87 cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] [instance: 7e22504f-45e4-4e23-ad92-378899aef112] Detaching volume from unknown instance
2015-12-02 21:33:23.190 1041 WARNING nova.virt.libvirt.driver [req-7ddecb3e-153d-471e-985e-275f18587f87 cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] [instance: 7e22504f-45e4-4e23-ad92-378899aef112] During detach_volume, instance disappeared.
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [req-7ddecb3e-153d-471e-985e-275f18587f87 cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] [instance: 7e22504f-45e4-4e23-ad92-378899aef112] Setting instance vm_state to ERROR
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] Traceback (most recent call last):
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6334, in _error_out_instance_on_exception
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] yield
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2683, in rebuild_instance
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] bdms, recreate, on_shared_storage, preserve_ephemeral)
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2722, in _do_rebuild_instance_with_claim
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] self._do_rebuild_instance(*args, **kwargs)
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2830, in _do_rebuild_instance
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] self._rebuild_default_impl(**kwargs)
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2583, in _rebuild_default_impl
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] detach_block_devices(context, bdms)
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2807, in detach_block_devices
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] destroy_bdm=False)
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4717, in _detach_volume
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] self.volume_api.terminate_connection(context, volume_id, connector)
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 223, in wrapper
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] six.reraise(exc_value, None, exc_trace)
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 212, in wrapper
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] res = method(self, ctx, volume_id, *args, **kwargs)
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 346, in terminate_connection
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] connector)
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] File "/usr/lib/python2.7/site-packages/cinderclient/v2/volumes.py", line 480, in terminate_connection
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] {'connector': connector})
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] File "/usr/lib/python2.7/site-packages/cinderclient/v2/volumes.py", line 402, in _action
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] return self.api.client.post(url, body=body)
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 104, in post
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] return self._cs_request(url, 'POST', **kwargs)
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 98, in _cs_request
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] return self.request(url, method, **kwargs)
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 91, in request
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] raise exceptions.from_response(resp, body)
2015-12-02 21:33:24.507 1041 ERROR nova.compute.manager [instance: 7e22504f-45e4-4e23-ad92-378899aef112] ClientException: The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-5031f353-e235-4386-9292-90b516083f9f)
Also on cinder volume server(which is also the controller in this demo
setup), the volume.log file shows following error:
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio [req-eaabe123-23f0-4bfb-94ce-bfceb63e922f cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] Failed to delete initiator iqn iqn.1994-05.com.redhat:f68acb3f9b3 to target.
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio Traceback (most recent call last):
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio File "/usr/lib/python2.7/site-packages/cinder/volume/targets/lio.py", line 194, in terminate_connection
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio run_as_root=True)
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 254, in inner
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio return f(*args, **kwargs)
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio File "/usr/lib/python2.7/site-packages/cinder/volume/targets/lio.py", line 64, in _execute
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio return utils.execute(*args, **kwargs)
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio File "/usr/lib/python2.7/site-packages/cinder/utils.py", line 155, in execute
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio return processutils.execute(*cmd, **kwargs)
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 275, in execute
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio cmd=sanitized_cmd)
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio ProcessExecutionError: Unexpected error while running command.
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf cinder-rtstool delete-initiator iqn.2010-10.org.openstack:volume-9c077c7d-2c94-4fc8-b5f3-22fb70bf86b4 iqn.1994-05.com.redhat:f68acb3f9b3 #my note: iqn.1994-05.com.redhat:f68acb3f9b3 is for compute2 not compute1, so it will fail.
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio Exit code: 1
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio Stdout: u''
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio Stderr: u'Traceback (most recent call last):\n File "/bin/cinder-rtstool", line 10, in <module>\n sys.exit(main())\n File "/usr/lib/python2.7/site-packages/cinder/cmd/rtstool.py", line 291, in main\n delete_initiator(target_iqn, initiator_iqn)\n File "/usr/lib/python2.7/site-packages/cinder/cmd/rtstool.py", line 149, in delete_initiator\n % {\'target\': target_iqn, \'acl\': initiator_iqn})\ncinder.cmd.rtstool.RtstoolError\n'
2015-12-03 15:24:22.876 12000 ERROR cinder.volume.targets.lio
2015-12-03 15:24:22.882 12000 ERROR cinder.volume.manager [req-eaabe123-23f0-4bfb-94ce-bfceb63e922f cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] Terminate volume connection failed: Failed to detach iSCSI target for volume 9c077c7d-2c94-4fc8-b5f3-22fb70bf86b4.
I found out a fixed bug related to the above cinder volume error
message, https://bugs.launchpad.net/cinder/+bug/1506496, that is,
don't raise the error if the target_iqn and initiator_iqn combination
isn't found in the delete_initiator() function call in rtstool.py
file. I applied the same patch manually, and the nova evacuate
command now works.
However I think this could be a new bug of nova. The rational of the
above bug fix is that sometimes cinder db is out of sync with back
end, so the error should be ignored. However in this case, I think
it's a new bug of nova or cinder, that the rebuild process will
terminate current volume connection, however, from the above log, it
uses the initiator_iqn of the destination compute node( is compute2
in this case), however, the current connection is actually from
compute1 (the current dead node), so the
self.volume_api.terminate_connection(context, volume_id, connector)
call in nova/compute/manager.py will always throw exception until
rtstool.py is patched in the above mentioned bug fix. Anyway, with
the above cinder bug fix, from below volume.log output, the
iscsi_target is first removed and then re-created, so the original
connection from compute1 is actually destroyed. After I restored the
network connection for compute1, the log shows compute1 still tries to
connect to this target, however it failed as expected.
2015-12-03 16:24:21.772 12000 INFO cinder.volume.manager [req-bd3b4ee9-6ee5-4cbe-9e55-223f8eecf134 cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] Terminate volume connection completed successfully. #my note: actually silently fails.
2015-12-03 16:24:22.707 12000 INFO cinder.volume.targets.lio [req-a6fc4152-fc1e-44ad-aa4b-089c5ebeb39f cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] Removing iscsi_target: 9c077c7d-2c94-4fc8-b5f3-22fb70bf86b4 #my node: maybe even there is volume connection, the iscsi target can be removed.
2015-12-03 16:24:23.410 12000 INFO cinder.volume.manager [req-a6fc4152-fc1e-44ad-aa4b-089c5ebeb39f cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] Detach volume completed successfully.
2015-12-03 16:24:24.458 12000 INFO cinder.volume.targets.lio [req-7965e50d-6946-4095-ac04-0f9163d08a47 cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] Creating iscsi_target for volume: volume-9c077c7d-2c94-4fc8-b5f3-22fb70bf86b4
2015-12-03 16:24:25.728 12000 INFO cinder.volume.manager [req-7965e50d-6946-4095-ac04-0f9163d08a47 cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] Initialize volume connection completed successfully.
2015-12-03 16:24:26.862 12000 INFO cinder.volume.manager [req-5f78baaa-6621-4d03-892e-9c94b5b6d849 cb2fd48bf37a42e29672077a45c18681 59e12095d44f430abe14aed87e01fa92 - - -] Attach volume completed successfully.
I think when exec "nova evacuate", nova should know who is using the
volume, and use that initiator_iqn instead of always using the
initiator_iqn of the currently running node(the destination compute),
or for single attach volume, just terminate the connection with one
input(the target_iqn), without specifying initiator_iqn.
I'm a newbie in openstack, maybe the above suggestion should goes to
cinder not nova, or I'm totally wrong, the issue I experienced is due
to other set up errors.
Openstack versions:
1. versions:
on controller/storage node:
#rpm -qa |grep nova
openstack-nova-conductor-12.0.0-1.el7.noarch
openstack-nova-api-12.0.0-1.el7.noarch
python-novaclient-2.30.1-1.el7.noarch
python-nova-12.0.0-1.el7.noarch
openstack-nova-cert-12.0.0-1.el7.noarch
openstack-nova-novncproxy-12.0.0-1.el7.noarch
openstack-nova-console-12.0.0-1.el7.noarch
openstack-nova-common-12.0.0-1.el7.noarch
openstack-nova-scheduler-12.0.0-1.el7.noarch
#rpm -qa |grep cinder
python-cinder-7.0.0-1.el7.noarch
python-cinderclient-1.4.0-1.el7.noarch
openstack-cinder-7.0.0-1.el7.noarch
# yum list installed |grep nova
openstack-nova-api.noarch 1:12.0.0-1.el7 @centos-openstack-liberty
openstack-nova-cert.noarch 1:12.0.0-1.el7 @centos-openstack-liberty
openstack-nova-common.noarch 1:12.0.0-1.el7 @centos-openstack-liberty
openstack-nova-conductor.noarch 1:12.0.0-1.el7 @centos-openstack-liberty
openstack-nova-console.noarch 1:12.0.0-1.el7 @centos-openstack-liberty
openstack-nova-novncproxy.noarch
openstack-nova-scheduler.noarch 1:12.0.0-1.el7 @centos-openstack-liberty
python-nova.noarch 1:12.0.0-1.el7 @centos-openstack-liberty
python-novaclient.noarch 1:2.30.1-1.el7 @centos-openstack-liberty
yum list installed |grep cinder
openstack-cinder.noarch 1:7.0.0-1.el7 @centos-openstack-liberty
python-cinder.noarch 1:7.0.0-1.el7 @centos-openstack-liberty
python-cinderclient.noarch 1.4.0-1.el7 @centos-openstack-liberty
on compute node:
#rpm -qa |grep nova
openstack-nova-common-12.0.0-1.el7.noarch
python-novaclient-2.30.1-1.el7.noarch
python-nova-12.0.0-1.el7.noarch
openstack-nova-compute-12.0.0-1.el7.noarch
#yum list installed |grep nova
openstack-nova-common.noarch 1:12.0.0-1.el7 @centos-openstack-liberty
openstack-nova-compute.noarch 1:12.0.0-1.el7 @centos-openstack-liberty
python-nova.noarch 1:12.0.0-1.el7 @centos-openstack-liberty
python-novaclient.noarch 1:2.30.1-1.el7 @centos-openstack-liberty
# rpm -qa |grep cinder
python-cinderclient-1.4.0-1.el7.noarch
# yum list installed |grep cinder
python-cinderclient.noarch 1.4.0-1.el7 @centos-openstack-liberty
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1522496/+subscriptions
References