← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1941861] Re: Retired and disabled compute service preventing start of all compute services after upgrade (TooOldComputeService exception)

 

One of the goal of the utils.raise_if_old_compute() is exactly to detect
that you have old computes in your environment. Base on the logs you had
a really old compute record, you cannot delete it as it predates
placement and therefore there is no corresponding resource provider in
the placement database. I would say it is all expected. Nova does not
support (never supported) environments with controllers on version N
while the compute was older than N-1. Hence your state was basically
unsupported. Therefore I marking this bug as Invalid. Feel free to
reopen it if you disagree.

In your environment, regardless of the utils.raise_if_old_compute()
check, your RPC between controllers and computes was pinned to a waaay
old version due to the old, inactive, compute being recorded in the DB.
This probably caused performance issues or even bugs. The good way to
fix it is to remove the old compute as you did it. Now you are in a lot
cleaner state than before.

Regarding "get_minimum_version_all_cells does not check disabled
services". A disable compute means that the compute exists and might be
re-enabled any time. So ignoring the version of it would cause that the
RPC would be pinned in a higher version then when the disable compute
gets re-enabled that compute will not be able to communicate due to the
too high RPC version. So no, we cannot ignore disabled computes. If a
compute is not needed by a deployment then it should be deleted not just
disabled.


** Tags added: compute upgrade

** Changed in: nova
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1941861

Title:
  Retired and disabled compute service preventing start of all compute
  services after upgrade (TooOldComputeService exception)

Status in OpenStack Compute (nova):
  Invalid

Bug description:

  Description
  ===========

  
  After we upgraded our cluster to wallaby on Ubuntu 20.04, the compute services were down on all compute servers, as well as nova-scheduler and nova-conductor.

  When checking the log, the error was:

  2021-08-26 19:26:20.932 59357 CRITICAL nova [req-a8c9b7b2-f4d2-4b68-aeb2-8fb96e9ffb20 - - - - -] Unhandled error: nova.exception.TooOldComputeService: Current Nova version does not support computes older than Victoria but the minimum compute service level in your system is 35 and the oldest supported service level is 52.
  2021-08-26 19:26:20.932 59357 ERROR nova Traceback (most recent call last):
  2021-08-26 19:26:20.932 59357 ERROR nova   File "/usr/bin/nova-conductor", line 10, in <module>
  2021-08-26 19:26:20.932 59357 ERROR nova     sys.exit(main())
  2021-08-26 19:26:20.932 59357 ERROR nova   File "/usr/lib/python3/dist-packages/nova/cmd/conductor.py", line 45, in main
  2021-08-26 19:26:20.932 59357 ERROR nova     server = service.Service.create(binary='nova-conductor',
  2021-08-26 19:26:20.932 59357 ERROR nova   File "/usr/lib/python3/dist-packages/nova/service.py", line 264, in create
  2021-08-26 19:26:20.932 59357 ERROR nova     utils.raise_if_old_compute()
  2021-08-26 19:26:20.932 59357 ERROR nova   File "/usr/lib/python3/dist-packages/nova/utils.py", line 1101, in raise_if_old_compute
  2021-08-26 19:26:20.932 59357 ERROR nova     raise exception.TooOldComputeService(
  2021-08-26 19:26:20.932 59357 ERROR nova nova.exception.TooOldComputeService: Current Nova version does not support computes older than Victoria but the minimum compute service level in your system is 35 and the oldest supported service level is 52.
  2021-08-26 19:26:20.932 59357 ERROR nova

  
  However, we had upgraded all compute servers so this was a bit unexpected

  The only suspect from "openstack compute service list" was a line with
  an old "updated At" time, but it was on a retired server (not in the
  DNS anymore, and not in the host list ) - trying to remove it yields a
  bug (that I could report separately) :

  root@controller:~# openstack compute service delete 21
  Failed to delete compute service with ID '21': Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
  <class 'ValueError'> (HTTP 500) (Request-ID: req-4390ef9c-3a45-44b2-92bc-db1923ffc83a)
  1 of 1 compute services failed to delete.

  root@controller:~# 
  logs:
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi [req-4390ef9c-3a45-44b2-92bc-db1923ffc83a 798bbc2bfd3e4123804ea493d6bf2197 0096b32340674f4cb9101354ba6a454c - 4820ac059d2f4f56a4e02d68982b9e71 4820ac059d2f4f56a4e02d68982b9e71] Unexpected exception in API method: ValueError: No such provider 3f7b9fde-9100-471f-bb3b-b65c607b5f84
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi Traceback (most recent call last):
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi   File "/usr/lib/python3/dist-packages/nova/api/openstack/wsgi.py", line 658, in wrapped
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi     return f(*args, **kwargs)
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi   File "/usr/lib/python3/dist-packages/nova/api/openstack/compute/services.py", line 286, in delete
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi     self.placementclient.delete_resource_provider(
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi   File "/usr/lib/python3/dist-packages/nova/scheduler/client/report.py", line 2257, in delete_resource_provider
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi     provider_uuids = self._provider_tree.get_provider_uuids_in_tree(
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi   File "/usr/lib/python3/dist-packages/nova/compute/provider_tree.py", line 288, in get_provider_uuids_in_tree
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi     return self._find_with_lock(
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi   File "/usr/lib/python3/dist-packages/nova/compute/provider_tree.py", line 439, in _find_with_lock
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi     raise ValueError(_("No such provider %s") % name_or_uuid)
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi ValueError: No such provider 3f7b9fde-9100-471f-bb3b-b65c607b5f84
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi
  2021-08-27 10:50:10.062 95744 INFO nova.api.openstack.wsgi [req-4390ef9c-3a45-44b2-92bc-db1923ffc83a 798bbc2bfd3e4123804ea493d6bf2197 0096b32340674f4cb9101354ba6a454c - 4820ac059d2f4f56a4e02d68982b9e71 4820ac059d2f4f56a4e02d68982b9e71] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
  <class 'ValueError'>

  Then:

  - Removing the utils.raise_if_old_compute() line code allowed the
  service on that machine to start

  - I then checked the database directly and the only occurrence of "35"
  related to services was indeed that disabled/undeletable service.
  After a "[nova]> update services set version = 53 where id = 21;" all
  services started.


  
  Steps to reproduce
  ==================

  Hard to tell from a new installation, since it requires a cluster with
  some history of old machines and probably an issue some day of a
  retired server with its compute service still kept in the database.

  Expected result
  ===============
  -> "get_minimum_version_all_cells" does not check disabled services 
  -> cluster services start normally

  and/or

  -> information from the log on *where* is the problematic service
  -> ability to delete it from commandline

  
  Actual result
  =============

  -> service do check disabled services on start
  -> log message is cryptic: "35" appears nowhere in openstack commands describing services (it's just an internal number, openstack commands apparently convert it to a *date* )
  -> Even guessing the issue, it's impossible to delete the service

  Environment
  ===========
  cloud-archive:wallaby on standard 20.04 server: 

  root@controller:~# dpkg -l | grep nova
  ii  nova-api                               3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - API frontend
  rc  nova-cert                              2:15.1.5-0ubuntu1~cloud0                             all          OpenStack Compute - certificate management
  ii  nova-common                            3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - common files
  ii  nova-conductor                         3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - conductor service
  rc  nova-consoleauth                       2:19.2.0-0ubuntu1~cloud0                             all          OpenStack Compute - Console Authenticator
  ii  nova-novncproxy                        3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - NoVNC proxy
  rc  nova-placement-api                     2:19.2.0-0ubuntu1~cloud0                             all          OpenStack Compute - placement API frontend
  ii  nova-scheduler                         3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - virtual machine scheduler
  ii  nova-spiceproxy                        3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - spice html5 proxy
  rc  python-nova                            2:18.2.0-0ubuntu2~cloud0                             all          OpenStack Compute Python 2 libraries
  ii  python-novaclient                      2:13.0.0-0ubuntu1~cloud0                             all          client library for OpenStack Compute API - Python 2.7
  ii  python3-nova                           3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute Python 3 libraries
  ii  python3-novaclient                     2:17.4.0-0ubuntu1~cloud0                             all          client library for OpenStack Compute API - 3.x

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1941861/+subscriptions



References