← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1832720] Re: Inconsistent power and vm states for physical instances when doing nova start/stop

 

Reviewed:  https://review.opendev.org/665975
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5b1c9dd05ccc0488e4d47bfc2442e46adb82b765
Submitter: Zuul
Branch:    master

commit 5b1c9dd05ccc0488e4d47bfc2442e46adb82b765
Author: Surya Seetharaman <suryaseetharaman.9@xxxxxxxxx>
Date:   Tue Jun 18 13:59:47 2019 +0200

    Grab fresh power state info from the driver
    
    In drivers that use a cache to store the node info (presently only
    ironic since [1]), the "_get_power_state" function called during
    instance actions like start or stop grabs the information from the node
    cache and saves it in the nova database instead of getting fresh
    information from the driver. This leads to inconsistency between
    the vm_state and power_state for an instance in the nova database
    (which remains until a power_sync happens between nova and ironic).
    This can be confusing for a user when doing "nova list" where the
    power_state might still be shutdown when the vm_state has already
    become active. On a default environment this inconsistency lasts
    for about ten minutes which is the default value for the
    sync_power_state_interval interval.
    
    This patch changes the "use_cache" to False in the compute manager
    when triggering an action on an instance like start/stop/reboot.
    
    [1] I907b69eb689cf6c169a4869cfc7889308ca419d5
    
    Change-Id: I8bca5d84c37d02331d2f9968a674f3398c1a8f5b
    Closes-Bug: #1832720


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1832720

Title:
  Inconsistent power and vm states for physical instances when doing
  nova start/stop

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) rocky series:
  New
Status in OpenStack Compute (nova) stein series:
  New

Bug description:
  For performance improvement, a cache was added in the ironic driver to
  store nodes (and hence their power states) in
  https://github.com/openstack/nova/commit/9d5fb1b58e908ccacbbbf29341918d0b0588a36f
  #diff-1e4547e2c3b36b8f836d8f851f85fde7. Later on through
  https://github.com/openstack/nova/commit/19cb8280232fd3b0ba0000a475d061ea9fb10e1a
  #diff-1e4547e2c3b36b8f836d8f851f85fde7 a "use_cache" option was added
  to remove inconsistencies during power_sync periodic task caused due
  to this cache. However when we do nova start/stop on an instance, the
  power_state of the instance is obtained from the cache
  (https://github.com/openstack/nova/blob/f298973520420710a617e4d79e853f2416b29786/nova/compute/manager.py#L1284)
  and this causes inconsistencies on the CLI listing/showing between the
  vm_states and power_states for a considerable amount of time (assuming
  until the next periodic power sync between nova and ironic that
  depends on sync_power_state_interval config option) before the cache
  gets refreshed to reflect the correct states:

  +--------------------------------------+---------+--------+------------+-------------+-------------------------------------------------------+
  | ID                                   | Name    | Status | Task State | Power State | Networks                                              |
  +--------------------------------------+---------+--------+------------+-------------+-------------------------------------------------------+
  | cd38b5c1-80dc-425d-8b8e-f523dc60e6ba | test000 | ACTIVE | -          | Shutdown    | private=fde8:a67c:e94e:0:5054:ff:fe28:5da1, 10.0.0.31 |
  | cd38b5c1-80dc-425d-8b8e-f523dc60e6ba | test000 | SHUTOFF | -          | Running     | private=fde8:a67c:e94e:0:5054:ff:fe28:5da1, 10.0.0.31 |+--------------------------------------+---------+---------+------------+-------------+-------------------------------------------------------+

  The code comment specifies that the refresh of the cache should happen
  during every RT periodic update which should be every 60 seconds
  (https://github.com/openstack/nova/blob/61558f274842b149044a14bbe7537b9f278035fd/nova/virt/ironic/driver.py#L989)
  but the inconsistency seems to last for more than a minute and this is
  confusing for the user. The "use_cache" should be set to False for
  these actions to avoid confusing vm and power states.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1832720/+subscriptions


References