← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1778305] Re: Nova may erronously look up service version of a deleted service, when hostname have been reused

 

Reviewed:  https://review.opendev.org/582408
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=16e163053ca39886f11fdb8a3af10a28619fc105
Submitter: Zuul
Branch:    master

commit 16e163053ca39886f11fdb8a3af10a28619fc105
Author: melanie witt <melwittt@xxxxxxxxx>
Date:   Thu Jul 12 21:48:23 2018 +0000

    Don't generate service UUID for deleted services
    
    In Pike, we added a UUID field to services and during an upgrade from
    Ocata => Pike, when instances are accessed joined with their associated
    services, we generate a UUID for the services on-the-fly.
    
    This causes a problem in the scenario where an operator upgrades their
    cluster and has old, deleted services with hostnames matching existing
    services associated with instances. When we go to generate the service
    UUID for the old, deleted service, we hit a ServiceTooOld exception.
    
    This addresses the problem by not bothering to generate a UUID for a
    deleted service. One alternative would be to exclude deleted services
    when we join the 'instances' and 'services' tables, but I'm not sure
    whether that approach might cause unintended effects where service
    information that used to be viewable for instances becomes hidden.
    
    Closes-Bug: #1778305
    Closes-Bug: #1764556
    
    Change-Id: I347096a527c257075cefe7b81210622f6cd87daf


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1778305

Title:
  Nova may erronously look up service version of a deleted service, when
  hostname have been reused

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  Confirmed
Status in OpenStack Compute (nova) queens series:
  Confirmed
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  Prerequisites:

  - A compute node running an old version of nova has been deleted. (In our case, version 9)
  - The hostname of said compute node has been reused, and has been upgraded as per normal. (To version 16)
  - The services table in the nova database contains both the old and the new node defined, where the deleted one are clearly marked as deleted - and with the old version specified in the version column.  The new node also exist, upgraded as it is.
  - One has at least one instance running on the upgraded node.
  - Perform upgrade from ocata to pike
  - Any projects with instances running on the upgraded node, may erronously get an error message that "ERROR (BadRequest): This service is older (v9) than the minimum (v16) version of the rest of the deployment. Unable to continue. (HTTP 400) (Request-ID: req-3e0ababe-e09b-4ef8-ba3a-43060bc1f807)" --- when performing 'nova list'.

  
  Example of how this may look in the database:

  MariaDB [nova]> SELECT * FROM services WHERE host = 'node11.acme.org';
  +---------------------+---------------------+---------------------+-----+-----------------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
  | created_at          | updated_at          | deleted_at          | id  | host            | binary       | topic   | report_count | disabled | deleted | disabled_reason | last_seen_up        | forced_down | version | uuid                                 |
  +---------------------+---------------------+---------------------+-----+-----------------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
  | 2017-10-17 13:06:10 | 2018-06-22 21:42:42 | NULL                | 179 | node11.acme.org | nova-compute | compute |      2138069 |        0 |       0 | NULL            | 2018-06-22 21:42:42 |           0 |      22 | 63e1cb55-ee00-4cb8-b304-160dd5c45fdd |
  | 2016-08-13 08:20:05 | 2016-11-15 00:01:21 | 2016-11-27 15:11:30 | 104 | node11.acme.org | nova-compute | compute |       796220 |        1 |     104 | NULL            | 2016-11-15 00:01:21 |           0 |       9 | NULL                                 |
  +---------------------+---------------------+---------------------+-----+-----------------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
  2 rows in set (0.01 sec)


  Removing the old service from the database is an effective workaround
  for this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1778305/+subscriptions


References