← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1759316] Re: pre-cells_v2 nova-osapi_compute service in database breaks instance lookup

 

Reviewed:  https://review.openstack.org/557506
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=eaf6340847c35ace3b4b681a95b8a79a7a3f2491
Submitter: Zuul
Branch:    master

commit eaf6340847c35ace3b4b681a95b8a79a7a3f2491
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Wed Mar 28 16:26:48 2018 -0400

    Log a warning and add nova-status check for old API service versions
    
    Change Ib984c30543acb3ca9cb95fb53d44d9ded0f5a5c8, which was added
    in Newton when cells v2 was optional, added some transitional code
    to the API for looking up an instance, which didn't rely on instance
    mappings in a cell to find the instance if the minimum nova-osapi_compute
    service version was from before Ocata.
    
    People have reported this being a source of confusion when upgrading
    from before Ocata, when cells v2 wasn't required, to Ocata+ where cells
    v2 along with the mapping setup is required. That's because they might
    have older nova-osapi_compute service version records in their 'nova'
    (cell) database which makes the API think the code is older than it
    actually is, and results in an InstanceNotFound error.
    
    This change does two things:
    
    1. Adds a warning to the compute API code in this scenario to serve
       as a breadcrumb if a deployment hits this issue.
    
    2. A nova-status check to look for minimum nova-osapi_compute service
       versions across all cells and report the issue as a warning. It's
       not an upgrade failure since we don't know how the nova-api service
       is configured, but leave that investigation up to the deployer.
    
    This is also written in such a way that we should be able to backport
    this through to stable/ocata.
    
    Change-Id: Ie2bc4616439352850cf29a9de7d33a06c8f7c2b8
    Closes-Bug: #1759316


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1759316

Title:
  pre-cells_v2 nova-osapi_compute service in database breaks instance
  lookup

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Confirmed
Status in OpenStack Compute (nova) pike series:
  Confirmed
Status in OpenStack Compute (nova) queens series:
  Confirmed

Bug description:
  This was encoutered on Ocata after and upgrade from Newton, but
  affects master to the best of my knowledge.

  During our upgrade from Newton -> Ocata after we finished cells_v2
  migration and map'd instances accordingly, `nova show $uuid` no longer
  worked. Returning the error:

  {"itemNotFound": {"message": "Instance 0e1e6038-bc69-4a85-b4cc-
  779e3b1d367a could not be found.", "code": 404}}

  After much probing and with the complete lack of logs/warnings I
  discovered that the 'nova-osapi_compute' service was reporting a
  different 'host' and there were duplicate entires for the same box
  (one using the ip address, the other using the hostname of the box).
  The older entries still had version < 15. [0]

  With version less than 15 and cells_v2, the instance lookup will not
  work since it never reaches the code path needed to talk to cells_v2
  things. [1]

  The solution was to service delete the old services.

  My suggestion moving forward is to do one or more of the following:
   * place a WARN in the linked nova code [1]
   * add a check to `nova-status upgrade check` to look for old service entries

  [0] http://paste.openstack.org/show/715421/
  [1] https://github.com/openstack/nova/blob/ed55dcad83d5db2fa7e43fc3d5465df1550b554c/nova/compute/api.py#L2263-L2270

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1759316/+subscriptions


References