yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #87012
[Bug 1941861] [NEW] Retired and disabled compute service preventing start of all compute services after upgrade (TooOldComputeService exception)
Public bug reported:
Description
===========
After we upgraded our cluster to wallaby on Ubuntu 20.04, the compute services were down on all compute servers, as well as nova-scheduler and nova-conductor.
When checking the log, the error was:
2021-08-26 19:26:20.932 59357 CRITICAL nova [req-a8c9b7b2-f4d2-4b68-aeb2-8fb96e9ffb20 - - - - -] Unhandled error: nova.exception.TooOldComputeService: Current Nova version does not support computes older than Victoria but the minimum compute service level in your system is 35 and the oldest supported service level is 52.
2021-08-26 19:26:20.932 59357 ERROR nova Traceback (most recent call last):
2021-08-26 19:26:20.932 59357 ERROR nova File "/usr/bin/nova-conductor", line 10, in <module>
2021-08-26 19:26:20.932 59357 ERROR nova sys.exit(main())
2021-08-26 19:26:20.932 59357 ERROR nova File "/usr/lib/python3/dist-packages/nova/cmd/conductor.py", line 45, in main
2021-08-26 19:26:20.932 59357 ERROR nova server = service.Service.create(binary='nova-conductor',
2021-08-26 19:26:20.932 59357 ERROR nova File "/usr/lib/python3/dist-packages/nova/service.py", line 264, in create
2021-08-26 19:26:20.932 59357 ERROR nova utils.raise_if_old_compute()
2021-08-26 19:26:20.932 59357 ERROR nova File "/usr/lib/python3/dist-packages/nova/utils.py", line 1101, in raise_if_old_compute
2021-08-26 19:26:20.932 59357 ERROR nova raise exception.TooOldComputeService(
2021-08-26 19:26:20.932 59357 ERROR nova nova.exception.TooOldComputeService: Current Nova version does not support computes older than Victoria but the minimum compute service level in your system is 35 and the oldest supported service level is 52.
2021-08-26 19:26:20.932 59357 ERROR nova
However, we had upgraded all compute servers so this was a bit unexpected
The only suspect from "openstack compute service list" was a line with
an old "updated At" time, but it was on a retired server (not in the DNS
anymore, and not in the host list ) - trying to remove it yields a bug
(that I could report separately) :
root@controller:~# openstack compute service delete 21
Failed to delete compute service with ID '21': Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'ValueError'> (HTTP 500) (Request-ID: req-4390ef9c-3a45-44b2-92bc-db1923ffc83a)
1 of 1 compute services failed to delete.
root@controller:~#
logs:
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi [req-4390ef9c-3a45-44b2-92bc-db1923ffc83a 798bbc2bfd3e4123804ea493d6bf2197 0096b32340674f4cb9101354ba6a454c - 4820ac059d2f4f56a4e02d68982b9e71 4820ac059d2f4f56a4e02d68982b9e71] Unexpected exception in API method: ValueError: No such provider 3f7b9fde-9100-471f-bb3b-b65c607b5f84
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi Traceback (most recent call last):
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/api/openstack/wsgi.py", line 658, in wrapped
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi return f(*args, **kwargs)
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/api/openstack/compute/services.py", line 286, in delete
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi self.placementclient.delete_resource_provider(
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/scheduler/client/report.py", line 2257, in delete_resource_provider
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi provider_uuids = self._provider_tree.get_provider_uuids_in_tree(
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/compute/provider_tree.py", line 288, in get_provider_uuids_in_tree
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi return self._find_with_lock(
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/compute/provider_tree.py", line 439, in _find_with_lock
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi raise ValueError(_("No such provider %s") % name_or_uuid)
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi ValueError: No such provider 3f7b9fde-9100-471f-bb3b-b65c607b5f84
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi
2021-08-27 10:50:10.062 95744 INFO nova.api.openstack.wsgi [req-4390ef9c-3a45-44b2-92bc-db1923ffc83a 798bbc2bfd3e4123804ea493d6bf2197 0096b32340674f4cb9101354ba6a454c - 4820ac059d2f4f56a4e02d68982b9e71 4820ac059d2f4f56a4e02d68982b9e71] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'ValueError'>
Then:
- Removing the utils.raise_if_old_compute() line code allowed the
service on that machine to start
- I then checked the database directly and the only occurrence of "35"
related to services was indeed that disabled/undeletable service. After
a "[nova]> update services set version = 53 where id = 21;" all services
started.
Steps to reproduce
==================
Hard to tell from a new installation, since it requires a cluster with
some history of old machines and probably an issue some day of a retired
server with its compute service still kept in the database.
Expected result
===============
-> "get_minimum_version_all_cells" does not check disabled services
-> cluster services start normally
and/or
-> information from the log on *where* is the problematic service
-> ability to delete it from commandline
Actual result
=============
-> service do check disabled services on start
-> log message is cryptic: "35" appears nowhere in openstack commands describing services (it's just an internal number, openstack commands apparently convert it to a *date* )
-> Even guessing the issue, it's impossible to delete the service
Environment
===========
cloud-archive:wallaby on standard 20.04 server:
root@controller:~# dpkg -l | grep nova
ii nova-api 3:23.0.1-0ubuntu1~cloud0 all OpenStack Compute - API frontend
rc nova-cert 2:15.1.5-0ubuntu1~cloud0 all OpenStack Compute - certificate management
ii nova-common 3:23.0.1-0ubuntu1~cloud0 all OpenStack Compute - common files
ii nova-conductor 3:23.0.1-0ubuntu1~cloud0 all OpenStack Compute - conductor service
rc nova-consoleauth 2:19.2.0-0ubuntu1~cloud0 all OpenStack Compute - Console Authenticator
ii nova-novncproxy 3:23.0.1-0ubuntu1~cloud0 all OpenStack Compute - NoVNC proxy
rc nova-placement-api 2:19.2.0-0ubuntu1~cloud0 all OpenStack Compute - placement API frontend
ii nova-scheduler 3:23.0.1-0ubuntu1~cloud0 all OpenStack Compute - virtual machine scheduler
ii nova-spiceproxy 3:23.0.1-0ubuntu1~cloud0 all OpenStack Compute - spice html5 proxy
rc python-nova 2:18.2.0-0ubuntu2~cloud0 all OpenStack Compute Python 2 libraries
ii python-novaclient 2:13.0.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - Python 2.7
ii python3-nova 3:23.0.1-0ubuntu1~cloud0 all OpenStack Compute Python 3 libraries
ii python3-novaclient 2:17.4.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - 3.x
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1941861
Title:
Retired and disabled compute service preventing start of all compute
services after upgrade (TooOldComputeService exception)
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
After we upgraded our cluster to wallaby on Ubuntu 20.04, the compute services were down on all compute servers, as well as nova-scheduler and nova-conductor.
When checking the log, the error was:
2021-08-26 19:26:20.932 59357 CRITICAL nova [req-a8c9b7b2-f4d2-4b68-aeb2-8fb96e9ffb20 - - - - -] Unhandled error: nova.exception.TooOldComputeService: Current Nova version does not support computes older than Victoria but the minimum compute service level in your system is 35 and the oldest supported service level is 52.
2021-08-26 19:26:20.932 59357 ERROR nova Traceback (most recent call last):
2021-08-26 19:26:20.932 59357 ERROR nova File "/usr/bin/nova-conductor", line 10, in <module>
2021-08-26 19:26:20.932 59357 ERROR nova sys.exit(main())
2021-08-26 19:26:20.932 59357 ERROR nova File "/usr/lib/python3/dist-packages/nova/cmd/conductor.py", line 45, in main
2021-08-26 19:26:20.932 59357 ERROR nova server = service.Service.create(binary='nova-conductor',
2021-08-26 19:26:20.932 59357 ERROR nova File "/usr/lib/python3/dist-packages/nova/service.py", line 264, in create
2021-08-26 19:26:20.932 59357 ERROR nova utils.raise_if_old_compute()
2021-08-26 19:26:20.932 59357 ERROR nova File "/usr/lib/python3/dist-packages/nova/utils.py", line 1101, in raise_if_old_compute
2021-08-26 19:26:20.932 59357 ERROR nova raise exception.TooOldComputeService(
2021-08-26 19:26:20.932 59357 ERROR nova nova.exception.TooOldComputeService: Current Nova version does not support computes older than Victoria but the minimum compute service level in your system is 35 and the oldest supported service level is 52.
2021-08-26 19:26:20.932 59357 ERROR nova
However, we had upgraded all compute servers so this was a bit unexpected
The only suspect from "openstack compute service list" was a line with
an old "updated At" time, but it was on a retired server (not in the
DNS anymore, and not in the host list ) - trying to remove it yields a
bug (that I could report separately) :
root@controller:~# openstack compute service delete 21
Failed to delete compute service with ID '21': Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'ValueError'> (HTTP 500) (Request-ID: req-4390ef9c-3a45-44b2-92bc-db1923ffc83a)
1 of 1 compute services failed to delete.
root@controller:~#
logs:
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi [req-4390ef9c-3a45-44b2-92bc-db1923ffc83a 798bbc2bfd3e4123804ea493d6bf2197 0096b32340674f4cb9101354ba6a454c - 4820ac059d2f4f56a4e02d68982b9e71 4820ac059d2f4f56a4e02d68982b9e71] Unexpected exception in API method: ValueError: No such provider 3f7b9fde-9100-471f-bb3b-b65c607b5f84
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi Traceback (most recent call last):
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/api/openstack/wsgi.py", line 658, in wrapped
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi return f(*args, **kwargs)
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/api/openstack/compute/services.py", line 286, in delete
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi self.placementclient.delete_resource_provider(
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/scheduler/client/report.py", line 2257, in delete_resource_provider
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi provider_uuids = self._provider_tree.get_provider_uuids_in_tree(
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/compute/provider_tree.py", line 288, in get_provider_uuids_in_tree
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi return self._find_with_lock(
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/compute/provider_tree.py", line 439, in _find_with_lock
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi raise ValueError(_("No such provider %s") % name_or_uuid)
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi ValueError: No such provider 3f7b9fde-9100-471f-bb3b-b65c607b5f84
2021-08-27 10:50:10.059 95744 ERROR nova.api.openstack.wsgi
2021-08-27 10:50:10.062 95744 INFO nova.api.openstack.wsgi [req-4390ef9c-3a45-44b2-92bc-db1923ffc83a 798bbc2bfd3e4123804ea493d6bf2197 0096b32340674f4cb9101354ba6a454c - 4820ac059d2f4f56a4e02d68982b9e71 4820ac059d2f4f56a4e02d68982b9e71] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'ValueError'>
Then:
- Removing the utils.raise_if_old_compute() line code allowed the
service on that machine to start
- I then checked the database directly and the only occurrence of "35"
related to services was indeed that disabled/undeletable service.
After a "[nova]> update services set version = 53 where id = 21;" all
services started.
Steps to reproduce
==================
Hard to tell from a new installation, since it requires a cluster with
some history of old machines and probably an issue some day of a
retired server with its compute service still kept in the database.
Expected result
===============
-> "get_minimum_version_all_cells" does not check disabled services
-> cluster services start normally
and/or
-> information from the log on *where* is the problematic service
-> ability to delete it from commandline
Actual result
=============
-> service do check disabled services on start
-> log message is cryptic: "35" appears nowhere in openstack commands describing services (it's just an internal number, openstack commands apparently convert it to a *date* )
-> Even guessing the issue, it's impossible to delete the service
Environment
===========
cloud-archive:wallaby on standard 20.04 server:
root@controller:~# dpkg -l | grep nova
ii nova-api 3:23.0.1-0ubuntu1~cloud0 all OpenStack Compute - API frontend
rc nova-cert 2:15.1.5-0ubuntu1~cloud0 all OpenStack Compute - certificate management
ii nova-common 3:23.0.1-0ubuntu1~cloud0 all OpenStack Compute - common files
ii nova-conductor 3:23.0.1-0ubuntu1~cloud0 all OpenStack Compute - conductor service
rc nova-consoleauth 2:19.2.0-0ubuntu1~cloud0 all OpenStack Compute - Console Authenticator
ii nova-novncproxy 3:23.0.1-0ubuntu1~cloud0 all OpenStack Compute - NoVNC proxy
rc nova-placement-api 2:19.2.0-0ubuntu1~cloud0 all OpenStack Compute - placement API frontend
ii nova-scheduler 3:23.0.1-0ubuntu1~cloud0 all OpenStack Compute - virtual machine scheduler
ii nova-spiceproxy 3:23.0.1-0ubuntu1~cloud0 all OpenStack Compute - spice html5 proxy
rc python-nova 2:18.2.0-0ubuntu2~cloud0 all OpenStack Compute Python 2 libraries
ii python-novaclient 2:13.0.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - Python 2.7
ii python3-nova 3:23.0.1-0ubuntu1~cloud0 all OpenStack Compute Python 3 libraries
ii python3-novaclient 2:17.4.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - 3.x
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1941861/+subscriptions
Follow ups