← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1457329] Re: Error status of instance after suspend exception

 

I agree with Eli that if libvirt fails we shouldn't assume the instance
is running and should be reset to ACTIVE status.

The suspend method in the compute manager will revert the task state to
None because it's using the @reverts_task_state decorator, so at least
you can delete the instance after it's gone into ERROR status:

https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4018

I guess from the virsh output above the instance is still running in the
hypervisor, so maybe there could be a case made that if the call to
libvirt fails with a certain type of error we could handle it and check
if the guest is still running, but we'd still need to report an instance
fault since the operation failed.

Anyway, I agree the reset-state API is what should be used here.

** Tags removed: volumes

** Changed in: nova
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1457329

Title:
  Error status of instance after suspend exception

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  devstack,version:
  ubuntu@dev1:/opt/stack/nova$ git log -1
  commit 2833f8c08fcfb7961b3c64b285ceff958bf5a05e
  Author: Zhengguang <zhengguangou@xxxxxxxxx>
  Date:   Thu May 21 02:31:50 2015 +0000

      remove _rescan_iscsi from disconnect_volume_multipath_iscsi
      
      terminating instance that attached more than one volume, disconnect
      the first volume is ok, but the first volume is not removed, then
      disconnect the second volume, disconnect_volume_multipath_iscsi
      will call _rescan_iscsi so that rescan the first device, although
      the instance is destroyed, the first device is residual, therefor
      we don't need rescan when disconnect volume.
      
      Change-Id: I7f2c688aba9e69afaf370b2badc86a2bb3ee899d
      Closes-Bug:#1402535

  suspend instance, then got exception as follows:

  Setting instance vm_state to ERROR^[[00m^[[01;31m2015-05-21 04:48:29.179 TRACE nova.compute.manager
  Traceback (most recent call last):
  ^[[01;31m2015-05-21 04:48:29.179 TRACE nova.compute.manager   File "/opt/stack/nova/nova/compute/manager.py", line 6089, in _error_out_instance_on_exception^[[01;31m2015-05-21 04:48:29.179 TRACE nova.compute.manager    yield
  ^[[01;31m2015-05-21 04:48:29.179 TRACE nova.compute.manager   File "/opt/stack/nova/nova/compute/manager.py", line 4014, in suspend_instance^[[01;31m2015-05-21 04:48:29.179 TRACE nova.compute.manager    self.driver.suspend(context, instance)
  ^[[01;31m2015-05-21 04:48:29.179 TRACE nova.compute.manager   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2248, in suspend^[[01;31m2015-05-21 04:48:29.179 TRACE nova.compute.manager dom.managedSave(0)
  ^[[01;31m2015-05-21 04:48:29.179 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit ^[[01;31m2015-05-21 04:48:29.179 TRACE nova.compute.manager   result = proxy_call(self._autowrap, f, *args, **kwargs)
  ^[[01;31m2015-05-21 04:48:29.179 TRACE nova.compute.manager  File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call    rv = execute(f, *args, **kwargs)
  ^[[01;31m2015-05-21 04:48:29.179 TRACE nova.compute.manager  File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute six.reraise(c, e, tb)
  ^[[01;31m2015-05-21 04:48:29.179 TRACE nova.compute.manager  File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker   rv = meth(*args, **kwargs)
  ^[[01;31m2015-05-21 04:48:29.179 TRACE nova.compute.manager  File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1167, in managedSave
  ^[[01;31m2015-05-21 04:48:29.179 TRACE nova.compute.manager   if ret == -1: raise libvirtError ('virDomainManagedSave() failed', dom=self)
                  libvirtError: operation failed: domain save job: unexpectedly failed

  ubuntu@dev1:~$ nova list
  +--------------------------------------+-------+-----------+------------+-------------+------------------------------------------------------+
  | ID                                   | Name  | Status    | Task State | Power State | Networks                                             |
  +--------------------------------------+-------+-----------+------------+-------------+------------------------------------------------------+
  | 0096094f-b854-4a56-bb35-c112cdbe20fb | test5 | ERROR     | -          | Running     | private=10.0.0.5, fd3b:f9:a091:0:f816:3eff:fe8e:dc62 |
  +--------------------------------------+-------+-----------+------------+-------------+------------------------------------------------------+

  "virsh list" can see the instance is running
  ubuntu@dev1:~$ virsh list --all
   Id    Name                           State
  ----------------------------------------------------
   2     instance-00000003              running

  Expected result:
  Even though occurs excption, the status of instance still should be "ACTIVE".

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1457329/+subscriptions


References