yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1236930] Re: attempting to reboot a shutdown/suspened/crashed/paused instance appears to have failed, but then surprisingly succeeds two minutes later

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Thierry Carrez <thierry.carrez+lp@xxxxxxxxx>
Date: Wed, 04 Dec 2013 10:20:28 -0000
Reply-to: Bug 1236930 <1236930@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
** Changed in: nova
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1236930

Title:
  attempting to reboot a shutdown/suspened/crashed/paused instance
  appears to have failed, but then surprisingly succeeds two minutes
  later

Status in OpenStack Compute (Nova):
  Fix Released

Bug description:
  I am running Havana from precise-proposed in the UCA (nova
  1:2013.2~b3-0ubuntu1~cloud0).

  To reproduce:

  - start an instance
  - reboot (sudo reboot) the compute node on which it is running
  -  after the compute node is done booting, the instance will be off:

  root@xen10:~# nova list
  +--------------------------------------+------+---------+------------+-------------+-------------------------+
  | ID                                   | Name | Status  | Task State | Power State | Networks                |
  +--------------------------------------+------+---------+------------+-------------+-------------------------+
  | 4824dce8-d876-4022-a446-3fc8d708ac62 | test | SHUTOFF | None       | Shutdown    | novanetwork=172.20.46.3 |
  +--------------------------------------+------+---------+------------+-------------+-------------------------+

  (note that although my hostname has "xen" in it, I'm using KVM.
  Haven't updated DNS yet...)

  - attempt to reboot the instance (nova reboot
  4824dce8-d876-4022-a446-3fc8d708ac62)

  # nova show 4824dce8-d876-4022-a446-3fc8d708ac62
  +--------------------------------------+----------------------------------------------------------+
  | Property                             | Value                                                    |
  +--------------------------------------+----------------------------------------------------------+
  | status                               | SHUTOFF                                                  |
  | updated                              | 2013-10-08T15:28:47Z                                     |
  | OS-EXT-STS:task_state                | rebooting                                                |

  The reboot fails. The compute node will log:

  2013-10-08 11:28:55.579 1400 WARNING nova.compute.manager [req-
  11fe1624-22f6-4348-81c5-185d0ce0d3a0 a70453729dd84bfd8f31019b1bb91e40
  46ab32189ab64a4c92f8f64e6c9ed028] [instance:
  4824dce8-d876-4022-a446-3fc8d708ac62] trying to reboot a non-running
  instance: (state: 4 expected: 1)

  - attempt to start the instance (nova start
  4824dce8-d876-4022-a446-3fc8d708ac62):

  produces console output:
  ERROR: Instance 4824dce8-d876-4022-a446-3fc8d708ac62 in task_state rebooting. Cannot start while the instance is in this state. (HTTP 400) (Request-ID: req-732224e1-8c34-4754-84f7-7a8476673185)

  - wait about 120 seconds, and the compute node will log:
  2013-10-08 11:30:56.082 1400 WARNING nova.virt.libvirt.driver [req-11fe1624-22f6-4348-81c5-185d0ce0d3a0 a70453729dd84bfd8f31019b1bb91e40 46ab32189ab64a4c92f8f64e6c9ed028] [instance: 4824dce8-d876-4022-a446-3fc8d708ac62] Failed to soft reboot instance. Trying hard reboot.

  Afterwards, the instance will be running.

  It's confusing that the reboot logs a failure for a very obvious
  reason (an instance that is not running can't be *re*booted), yet the
  instance's state remains as "rebooting". I had expected that the
  reboot had failed, and openstack was in some consistant state. I was
  then again suprised when in fact it *was* still rebooting -- it just
  took two minutes to do so. Less confusing would be to catch the
  original error, and report the reboot as failed. The log messages are
  confusing, because the first sets the expectation that a non-running
  instance can't be rebooted, but it can (two minutes later).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1236930/+subscriptions