← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1941861] [NEW] Retired and disabled compute service preventing start of all compute services after upgrade (TooOldComputeService exception)

 

Public bug reported:


Description
===========


After we upgraded our cluster to wallaby on Ubuntu 20.04, the compute services were down on all compute servers, as well as nova-scheduler and nova-conductor.

When checking the log, the error was:

2021-08-26 19:26:20.932 59357 CRITICAL nova [req-a8c9b7b2-f4d2-4b68-aeb2-8fb96e9ffb20 - - - - -] Unhandled error: nova.exception.TooOldComputeService: Current Nova version does not support computes older than Victoria but the minimum compute service level in your system is 35 and the oldest supported service level is 52.
2021-08-26 19:26:20.932 59357 ERROR nova Traceback (most recent call last):
2021-08-26 19:26:20.932 59357 ERROR nova   File "/usr/bin/nova-conductor", line 10, in <module>
2021-08-26 19:26:20.932 59357 ERROR nova     sys.exit(main())
2021-08-26 19:26:20.932 59357 ERROR nova   File "/usr/lib/python3/dist-packages/nova/cmd/conductor.py", line 45, in main
2021-08-26 19:26:20.932 59357 ERROR nova     server = service.Service.create(binary='nova-conductor',
2021-08-26 19:26:20.932 59357 ERROR nova   File "/usr/lib/python3/dist-packages/nova/service.py", line 264, in create
2021-08-26 19:26:20.932 59357 ERROR nova     utils.raise_if_old_compute()
2021-08-26 19:26:20.932 59357 ERROR nova   File "/usr/lib/python3/dist-packages/nova/utils.py", line 1101, in raise_if_old_compute
2021-08-26 19:26:20.932 59357 ERROR nova     raise exception.TooOldComputeService(
2021-08-26 19:26:20.932 59357 ERROR nova nova.exception.TooOldComputeService: Current Nova version does not support computes older than Victoria but the minimum compute service level in your system is 35 and the oldest supported service level is 52.
2021-08-26 19:26:20.932 59357 ERROR nova


However, we had upgraded all compute servers so this was a bit unexpected

The only suspect from "openstack compute service list" was a line with
an old "updated At" time, but it was on a retired server (not in the DNS
anymore, and not in the host list ) - trying to remove it yields a bug
(that I could report separately) :

root@controller:~# openstack compute service delete 21
Failed to delete compute service with ID '21': Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'ValueError'> (HTTP 500) (Request-ID: req-4390ef9c-3a45-44b2-92bc-db1923ffc83a)
1 of 1 compute services failed to delete.

root@controller:~# 
logs:
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi [req-4390ef9c-3a45-44b2-92bc-db1923ffc83a 798bbc2bfd3e4123804ea493d6bf2197 0096b32340674f4cb9101354ba6a454c - 4820ac059d2f4f56a4e02d68982b9e71 4820ac059d2f4f56a4e02d68982b9e71] Unexpected exception in API method: ValueError: No such provider 3f7b9fde-9100-471f-bb3b-b65c607b5f84
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi Traceback (most recent call last):
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi   File "/usr/lib/python3/dist-packages/nova/api/openstack/wsgi.py", line 658, in wrapped
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi     return f(*args, **kwargs)
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi   File "/usr/lib/python3/dist-packages/nova/api/openstack/compute/services.py", line 286, in delete
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi     self.placementclient.delete_resource_provider(
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi   File "/usr/lib/python3/dist-packages/nova/scheduler/client/report.py", line 2257, in delete_resource_provider
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi     provider_uuids = self._provider_tree.get_provider_uuids_in_tree(
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi   File "/usr/lib/python3/dist-packages/nova/compute/provider_tree.py", line 288, in get_provider_uuids_in_tree
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi     return self._find_with_lock(
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi   File "/usr/lib/python3/dist-packages/nova/compute/provider_tree.py", line 439, in _find_with_lock
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi     raise ValueError(_("No such provider %s") % name_or_uuid)
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi ValueError: No such provider 3f7b9fde-9100-471f-bb3b-b65c607b5f84
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi
2021-08-27 10:50:10.062 95744 INFO nova.api.openstack.wsgi [req-4390ef9c-3a45-44b2-92bc-db1923ffc83a 798bbc2bfd3e4123804ea493d6bf2197 0096b32340674f4cb9101354ba6a454c - 4820ac059d2f4f56a4e02d68982b9e71 4820ac059d2f4f56a4e02d68982b9e71] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'ValueError'>

Then:

- Removing the utils.raise_if_old_compute() line code allowed the
service on that machine to start

- I then checked the database directly and the only occurrence of "35"
related to services was indeed that disabled/undeletable service. After
a "[nova]> update services set version = 53 where id = 21;" all services
started.


Steps to reproduce
==================

Hard to tell from a new installation, since it requires a cluster with
some history of old machines and probably an issue some day of a retired
server with its compute service still kept in the database.

Expected result
===============
-> "get_minimum_version_all_cells" does not check disabled services 
-> cluster services start normally

and/or

-> information from the log on *where* is the problematic service
-> ability to delete it from commandline


Actual result
=============

-> service do check disabled services on start
-> log message is cryptic: "35" appears nowhere in openstack commands describing services (it's just an internal number, openstack commands apparently convert it to a *date* )
-> Even guessing the issue, it's impossible to delete the service

Environment
===========
cloud-archive:wallaby on standard 20.04 server: 

root@controller:~# dpkg -l | grep nova
ii  nova-api                               3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - API frontend
rc  nova-cert                              2:15.1.5-0ubuntu1~cloud0                             all          OpenStack Compute - certificate management
ii  nova-common                            3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - common files
ii  nova-conductor                         3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - conductor service
rc  nova-consoleauth                       2:19.2.0-0ubuntu1~cloud0                             all          OpenStack Compute - Console Authenticator
ii  nova-novncproxy                        3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - NoVNC proxy
rc  nova-placement-api                     2:19.2.0-0ubuntu1~cloud0                             all          OpenStack Compute - placement API frontend
ii  nova-scheduler                         3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - virtual machine scheduler
ii  nova-spiceproxy                        3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - spice html5 proxy
rc  python-nova                            2:18.2.0-0ubuntu2~cloud0                             all          OpenStack Compute Python 2 libraries
ii  python-novaclient                      2:13.0.0-0ubuntu1~cloud0                             all          client library for OpenStack Compute API - Python 2.7
ii  python3-nova                           3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute Python 3 libraries
ii  python3-novaclient                     2:17.4.0-0ubuntu1~cloud0                             all          client library for OpenStack Compute API - 3.x

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1941861

Title:
  Retired and disabled compute service preventing start of all compute
  services after upgrade (TooOldComputeService exception)

Status in OpenStack Compute (nova):
  New

Bug description:

  Description
  ===========

  
  After we upgraded our cluster to wallaby on Ubuntu 20.04, the compute services were down on all compute servers, as well as nova-scheduler and nova-conductor.

  When checking the log, the error was:

  2021-08-26 19:26:20.932 59357 CRITICAL nova [req-a8c9b7b2-f4d2-4b68-aeb2-8fb96e9ffb20 - - - - -] Unhandled error: nova.exception.TooOldComputeService: Current Nova version does not support computes older than Victoria but the minimum compute service level in your system is 35 and the oldest supported service level is 52.
  2021-08-26 19:26:20.932 59357 ERROR nova Traceback (most recent call last):
  2021-08-26 19:26:20.932 59357 ERROR nova   File "/usr/bin/nova-conductor", line 10, in <module>
  2021-08-26 19:26:20.932 59357 ERROR nova     sys.exit(main())
  2021-08-26 19:26:20.932 59357 ERROR nova   File "/usr/lib/python3/dist-packages/nova/cmd/conductor.py", line 45, in main
  2021-08-26 19:26:20.932 59357 ERROR nova     server = service.Service.create(binary='nova-conductor',
  2021-08-26 19:26:20.932 59357 ERROR nova   File "/usr/lib/python3/dist-packages/nova/service.py", line 264, in create
  2021-08-26 19:26:20.932 59357 ERROR nova     utils.raise_if_old_compute()
  2021-08-26 19:26:20.932 59357 ERROR nova   File "/usr/lib/python3/dist-packages/nova/utils.py", line 1101, in raise_if_old_compute
  2021-08-26 19:26:20.932 59357 ERROR nova     raise exception.TooOldComputeService(
  2021-08-26 19:26:20.932 59357 ERROR nova nova.exception.TooOldComputeService: Current Nova version does not support computes older than Victoria but the minimum compute service level in your system is 35 and the oldest supported service level is 52.
  2021-08-26 19:26:20.932 59357 ERROR nova

  
  However, we had upgraded all compute servers so this was a bit unexpected

  The only suspect from "openstack compute service list" was a line with
  an old "updated At" time, but it was on a retired server (not in the
  DNS anymore, and not in the host list ) - trying to remove it yields a
  bug (that I could report separately) :

  root@controller:~# openstack compute service delete 21
  Failed to delete compute service with ID '21': Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
  <class 'ValueError'> (HTTP 500) (Request-ID: req-4390ef9c-3a45-44b2-92bc-db1923ffc83a)
  1 of 1 compute services failed to delete.

  root@controller:~# 
  logs:
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi [req-4390ef9c-3a45-44b2-92bc-db1923ffc83a 798bbc2bfd3e4123804ea493d6bf2197 0096b32340674f4cb9101354ba6a454c - 4820ac059d2f4f56a4e02d68982b9e71 4820ac059d2f4f56a4e02d68982b9e71] Unexpected exception in API method: ValueError: No such provider 3f7b9fde-9100-471f-bb3b-b65c607b5f84
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi Traceback (most recent call last):
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi   File "/usr/lib/python3/dist-packages/nova/api/openstack/wsgi.py", line 658, in wrapped
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi     return f(*args, **kwargs)
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi   File "/usr/lib/python3/dist-packages/nova/api/openstack/compute/services.py", line 286, in delete
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi     self.placementclient.delete_resource_provider(
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi   File "/usr/lib/python3/dist-packages/nova/scheduler/client/report.py", line 2257, in delete_resource_provider
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi     provider_uuids = self._provider_tree.get_provider_uuids_in_tree(
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi   File "/usr/lib/python3/dist-packages/nova/compute/provider_tree.py", line 288, in get_provider_uuids_in_tree
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi     return self._find_with_lock(
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi   File "/usr/lib/python3/dist-packages/nova/compute/provider_tree.py", line 439, in _find_with_lock
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi     raise ValueError(_("No such provider %s") % name_or_uuid)
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi ValueError: No such provider 3f7b9fde-9100-471f-bb3b-b65c607b5f84
  2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi
  2021-08-27 10:50:10.062 95744 INFO nova.api.openstack.wsgi [req-4390ef9c-3a45-44b2-92bc-db1923ffc83a 798bbc2bfd3e4123804ea493d6bf2197 0096b32340674f4cb9101354ba6a454c - 4820ac059d2f4f56a4e02d68982b9e71 4820ac059d2f4f56a4e02d68982b9e71] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
  <class 'ValueError'>

  Then:

  - Removing the utils.raise_if_old_compute() line code allowed the
  service on that machine to start

  - I then checked the database directly and the only occurrence of "35"
  related to services was indeed that disabled/undeletable service.
  After a "[nova]> update services set version = 53 where id = 21;" all
  services started.


  
  Steps to reproduce
  ==================

  Hard to tell from a new installation, since it requires a cluster with
  some history of old machines and probably an issue some day of a
  retired server with its compute service still kept in the database.

  Expected result
  ===============
  -> "get_minimum_version_all_cells" does not check disabled services 
  -> cluster services start normally

  and/or

  -> information from the log on *where* is the problematic service
  -> ability to delete it from commandline

  
  Actual result
  =============

  -> service do check disabled services on start
  -> log message is cryptic: "35" appears nowhere in openstack commands describing services (it's just an internal number, openstack commands apparently convert it to a *date* )
  -> Even guessing the issue, it's impossible to delete the service

  Environment
  ===========
  cloud-archive:wallaby on standard 20.04 server: 

  root@controller:~# dpkg -l | grep nova
  ii  nova-api                               3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - API frontend
  rc  nova-cert                              2:15.1.5-0ubuntu1~cloud0                             all          OpenStack Compute - certificate management
  ii  nova-common                            3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - common files
  ii  nova-conductor                         3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - conductor service
  rc  nova-consoleauth                       2:19.2.0-0ubuntu1~cloud0                             all          OpenStack Compute - Console Authenticator
  ii  nova-novncproxy                        3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - NoVNC proxy
  rc  nova-placement-api                     2:19.2.0-0ubuntu1~cloud0                             all          OpenStack Compute - placement API frontend
  ii  nova-scheduler                         3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - virtual machine scheduler
  ii  nova-spiceproxy                        3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute - spice html5 proxy
  rc  python-nova                            2:18.2.0-0ubuntu2~cloud0                             all          OpenStack Compute Python 2 libraries
  ii  python-novaclient                      2:13.0.0-0ubuntu1~cloud0                             all          client library for OpenStack Compute API - Python 2.7
  ii  python3-nova                           3:23.0.1-0ubuntu1~cloud0                             all          OpenStack Compute Python 3 libraries
  ii  python3-novaclient                     2:17.4.0-0ubuntu1~cloud0                             all          client library for OpenStack Compute API - 3.x

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1941861/+subscriptions



Follow ups