← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2097371] [NEW] Failed reboot is reported as success

 

Public bug reported:

Description
===========

If a server is rebooted (either soft or hard reboot), the API returns 202 to indicate that the
command has been accepted, but isn't finished.

Polling the instance state is commonly done, but especially with


Steps to reproduce
==================
A chronological list of steps which will bring off the
issue you noticed:

- Modified the code in a compute driver to raise an exception when reboot is 
triggered.
- Deployed the code

- Run:
    $ openstack server reboot $SERVER_ID
    $ REQUEST_ID=$(openstack server event list $SERVER_ID -c request_id -f value | head -1)
    $ openstack server event show $SERVER_ID $REQUEST_ID -f json

Expected result
===============

The API reports an error:

```
{
  "action": "reboot",
  "events": [
    {
      "details": null,
      "event": "compute_reboot_instance",
      "finish_time": "2025-02-04T14:53:52.000000",
      "host": null,
      "host_id": "64cf076bee4eacd535d5fbdb3ae9fedb3ab5fc56b1ff5202caf2dcc0",
      "result": "Error",
      "start_time": "2025-02-04T14:53:51.000000",
      "traceback": null
    }
  ],
  "id": "req-b42718ed-8abd-4312-8804-176bbafd824e",
  "message": "Error",
  "project_id": "e9141fb24eee4b3e9f25ae69cda31132",
  "request_id": "req-b42718ed-8abd-4312-8804-176bbafd824e",
  "start_time": "2025-02-04T14:53:51.000000",
  "user_id": "...."
}
```


Actual result
=============


The API reports success:

```
{
  "action": "reboot",
  "events": [
    {
      "details": null,
      "event": "compute_reboot_instance",
      "finish_time": "2025-02-04T11:37:22.000000",
      "host": null,
      "host_id": "64cf076bee4eacd535d5fbdb3ae9fedb3ab5fc56b1ff5202caf2dcc0",
      "result": "Success",
      "start_time": "2025-02-04T11:37:22.000000",
      "traceback": null
    }
  ],
  "id": "req-24372c07-a5b3-41ea-94bc-5af7e787d181",
  "message": null,
  "project_id": "e9141fb24eee4b3e9f25ae69cda31132",
  "request_id": "req-24372c07-a5b3-41ea-94bc-5af7e787d181",
  "start_time": "2025-02-04T11:37:22.000000",
  "user_id": "...."
}
```


Environment
===========
1. Nova version: Own release, but the relevant code hasn't changed.
   In particular the line that swallows the exception:
   https://github.com/sapcc/nova/blob/9a5567b649791bce7c63c5282eab14e22d659b25/nova/compute/manager.py#L3898
   is the same in master and predates xena
   https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4466

 
2. Which hypervisor did you use?
   vmwareapi, but since the handling of the error happens in the compute-manager,
   it should be indepedent.

2. Which storage type did you use?
   Not relevant.

3. Which networking type did you use?
   Not relevant.


Logs & Configs
==============


```
2025-02-04 11:37:22,664 7 DEBUG nova.virt.vmwareapi.vmops [] [instance: 884fb1a4-c5e8-4b9d-a029-7ed58b11005f] Raising exception for VM reboot /var/lib/openstack/lib/python3.8/site-packages/nova/virt/vmwareapi/vmops.py:2021
2025-02-04 11:37:22,745 7 DEBUG nova.compute.manager [] Checking state _get_power_state /var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py:1584
2025-02-04 11:37:22,774 7 WARNING nova.compute.manager [] Reboot failed but instance is running: oslo_vmware.exceptions.VimException: Test exception

```

(I added the test exception to trigger the error reporting manually.)

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2097371

Title:
  Failed reboot is reported as success

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========

  If a server is rebooted (either soft or hard reboot), the API returns 202 to indicate that the
  command has been accepted, but isn't finished.

  Polling the instance state is commonly done, but especially with


  Steps to reproduce
  ==================
  A chronological list of steps which will bring off the
  issue you noticed:

  - Modified the code in a compute driver to raise an exception when reboot is 
  triggered.
  - Deployed the code

  - Run:
      $ openstack server reboot $SERVER_ID
      $ REQUEST_ID=$(openstack server event list $SERVER_ID -c request_id -f value | head -1)
      $ openstack server event show $SERVER_ID $REQUEST_ID -f json

  Expected result
  ===============

  The API reports an error:

  ```
  {
    "action": "reboot",
    "events": [
      {
        "details": null,
        "event": "compute_reboot_instance",
        "finish_time": "2025-02-04T14:53:52.000000",
        "host": null,
        "host_id": "64cf076bee4eacd535d5fbdb3ae9fedb3ab5fc56b1ff5202caf2dcc0",
        "result": "Error",
        "start_time": "2025-02-04T14:53:51.000000",
        "traceback": null
      }
    ],
    "id": "req-b42718ed-8abd-4312-8804-176bbafd824e",
    "message": "Error",
    "project_id": "e9141fb24eee4b3e9f25ae69cda31132",
    "request_id": "req-b42718ed-8abd-4312-8804-176bbafd824e",
    "start_time": "2025-02-04T14:53:51.000000",
    "user_id": "...."
  }
  ```

  
  Actual result
  =============

  
  The API reports success:

  ```
  {
    "action": "reboot",
    "events": [
      {
        "details": null,
        "event": "compute_reboot_instance",
        "finish_time": "2025-02-04T11:37:22.000000",
        "host": null,
        "host_id": "64cf076bee4eacd535d5fbdb3ae9fedb3ab5fc56b1ff5202caf2dcc0",
        "result": "Success",
        "start_time": "2025-02-04T11:37:22.000000",
        "traceback": null
      }
    ],
    "id": "req-24372c07-a5b3-41ea-94bc-5af7e787d181",
    "message": null,
    "project_id": "e9141fb24eee4b3e9f25ae69cda31132",
    "request_id": "req-24372c07-a5b3-41ea-94bc-5af7e787d181",
    "start_time": "2025-02-04T11:37:22.000000",
    "user_id": "...."
  }
  ```


  Environment
  ===========
  1. Nova version: Own release, but the relevant code hasn't changed.
     In particular the line that swallows the exception:
     https://github.com/sapcc/nova/blob/9a5567b649791bce7c63c5282eab14e22d659b25/nova/compute/manager.py#L3898
     is the same in master and predates xena
     https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4466

   
  2. Which hypervisor did you use?
     vmwareapi, but since the handling of the error happens in the compute-manager,
     it should be indepedent.

  2. Which storage type did you use?
     Not relevant.

  3. Which networking type did you use?
     Not relevant.

  
  Logs & Configs
  ==============


  ```
  2025-02-04 11:37:22,664 7 DEBUG nova.virt.vmwareapi.vmops [] [instance: 884fb1a4-c5e8-4b9d-a029-7ed58b11005f] Raising exception for VM reboot /var/lib/openstack/lib/python3.8/site-packages/nova/virt/vmwareapi/vmops.py:2021
  2025-02-04 11:37:22,745 7 DEBUG nova.compute.manager [] Checking state _get_power_state /var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py:1584
  2025-02-04 11:37:22,774 7 WARNING nova.compute.manager [] Reboot failed but instance is running: oslo_vmware.exceptions.VimException: Test exception

  ```

  (I added the test exception to trigger the error reporting manually.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2097371/+subscriptions