yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #71499
[Bug 1753443] [NEW] os_nova: upgrade_levels/compute=auto failure on master
Public bug reported:
It looks like a recent change [1] in nova, to remove RPC 4.x support,
has exposed a bug when using upgrade_levels/compute=auto on a new
deployment. This is blocking the openstack-ansible-os_nova master gate.
Tempest tests are failing, the following in nova-conductor.log shows the failure:
```
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server [req-9c3f2dd3-81dd-4275-9a61-a3a859dde29d 3639ea84ebcf4c858de98eeede6789a9 3b9624e03ed740f483c64301d0d11372 - default default] Exception during message handling: RPCVersionCapError: Requested message version, 5.0 is incompatible. It needs to be equal in major version and less than or equal in minor version as the specified version cap 4.11.
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server File "/openstack/venvs/nova-testing/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server File "/openstack/venvs/nova-testing/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server File "/openstack/venvs/nova-testing/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server File "/openstack/venvs/nova-testing/local/lib/python2.7/site-packages/nova/conductor/manager.py", line 1265, in schedule_and_build_instances
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server limits=host.limits, host_list=host_list)
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server File "/openstack/venvs/nova-testing/local/lib/python2.7/site-packages/nova/compute/rpcapi.py", line 1030, in build_and_run_instance
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server cctxt.cast(ctxt, 'build_and_run_instance', **kwargs)
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server File "/openstack/venvs/nova-testing/local/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 149, in cast
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server self._check_version_cap(msg.get('version'))
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server File "/openstack/venvs/nova-testing/local/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 126, in _check_version_cap
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server version_cap=self.version_cap)
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server RPCVersionCapError: Requested message version, 5.0 is incompatible. It needs to be equal in major version and less than or equal in minor version as the specified version cap 4.11.
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server
```
When openstack-ansible-os_nova is used for a new deployment, the following appears in the logs:
```
2018-03-02 17:25:55.954 19495 DEBUG nova.compute.rpcapi [req-97c173ed-052e-4ce7-8314-d220dfdab8e7 - - - - -] Not caching compute RPC version_cap, because min service_version is 0. Please ensure a nova-compute service has been started. Defaulting to Mitaka RPC. _determine_version_cap /openstack/venvs/nova-master/lib/python2.7/site-packages/nova/compute/rpcapi.py:408
```
The reference to Mitaka is caused by [2], it looks to be intended to set
the version cap to be as permissive as possible (N to N+1 upgrades) but
it appears it hasn't been updated since it was first added for newton.
Restarting the services addresses the issue, observed by:
```
2018-03-04 21:42:14.367 21270 INFO nova.compute.rpcapi [req-95678c1e-8465-4059-8095-70479085b179 - - - - -] Automatically selected compute RPC version 5.0 from minimum service version 30
```
It seems like there may be two issues exposed here, one is the bug in
nova setting the minimum version and the other is how OSA handles the
deployment. With the default OSA deployment the minimum RPC version will
change with a restart, it would seem that has the potential to cause
failures if the order of the restarts is not controlled given those
restarts are not triggered by the deployment process.
[1] https://github.com/openstack/nova/commit/a761e57368280b6d3e931831ecd393fd5787b3ef
[2] https://github.com/openstack/nova/blob/a761e57368280b6d3e931831ecd393fd5787b3ef/nova/compute/rpcapi.py#L384-L392
** Affects: nova
Importance: Undecided
Assignee: git-harry (git-harry)
Status: In Progress
** Affects: openstack-ansible
Importance: Undecided
Status: New
** Also affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1753443
Title:
os_nova: upgrade_levels/compute=auto failure on master
Status in OpenStack Compute (nova):
In Progress
Status in openstack-ansible:
New
Bug description:
It looks like a recent change [1] in nova, to remove RPC 4.x support,
has exposed a bug when using upgrade_levels/compute=auto on a new
deployment. This is blocking the openstack-ansible-os_nova master
gate.
Tempest tests are failing, the following in nova-conductor.log shows the failure:
```
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server [req-9c3f2dd3-81dd-4275-9a61-a3a859dde29d 3639ea84ebcf4c858de98eeede6789a9 3b9624e03ed740f483c64301d0d11372 - default default] Exception during message handling: RPCVersionCapError: Requested message version, 5.0 is incompatible. It needs to be equal in major version and less than or equal in minor version as the specified version cap 4.11.
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server File "/openstack/venvs/nova-testing/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server File "/openstack/venvs/nova-testing/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server File "/openstack/venvs/nova-testing/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server File "/openstack/venvs/nova-testing/local/lib/python2.7/site-packages/nova/conductor/manager.py", line 1265, in schedule_and_build_instances
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server limits=host.limits, host_list=host_list)
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server File "/openstack/venvs/nova-testing/local/lib/python2.7/site-packages/nova/compute/rpcapi.py", line 1030, in build_and_run_instance
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server cctxt.cast(ctxt, 'build_and_run_instance', **kwargs)
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server File "/openstack/venvs/nova-testing/local/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 149, in cast
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server self._check_version_cap(msg.get('version'))
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server File "/openstack/venvs/nova-testing/local/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 126, in _check_version_cap
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server version_cap=self.version_cap)
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server RPCVersionCapError: Requested message version, 5.0 is incompatible. It needs to be equal in major version and less than or equal in minor version as the specified version cap 4.11.
2018-03-03 05:13:23.679 9771 ERROR oslo_messaging.rpc.server
```
When openstack-ansible-os_nova is used for a new deployment, the following appears in the logs:
```
2018-03-02 17:25:55.954 19495 DEBUG nova.compute.rpcapi [req-97c173ed-052e-4ce7-8314-d220dfdab8e7 - - - - -] Not caching compute RPC version_cap, because min service_version is 0. Please ensure a nova-compute service has been started. Defaulting to Mitaka RPC. _determine_version_cap /openstack/venvs/nova-master/lib/python2.7/site-packages/nova/compute/rpcapi.py:408
```
The reference to Mitaka is caused by [2], it looks to be intended to
set the version cap to be as permissive as possible (N to N+1
upgrades) but it appears it hasn't been updated since it was first
added for newton.
Restarting the services addresses the issue, observed by:
```
2018-03-04 21:42:14.367 21270 INFO nova.compute.rpcapi [req-95678c1e-8465-4059-8095-70479085b179 - - - - -] Automatically selected compute RPC version 5.0 from minimum service version 30
```
It seems like there may be two issues exposed here, one is the bug in
nova setting the minimum version and the other is how OSA handles the
deployment. With the default OSA deployment the minimum RPC version
will change with a restart, it would seem that has the potential to
cause failures if the order of the restarts is not controlled given
those restarts are not triggered by the deployment process.
[1] https://github.com/openstack/nova/commit/a761e57368280b6d3e931831ecd393fd5787b3ef
[2] https://github.com/openstack/nova/blob/a761e57368280b6d3e931831ecd393fd5787b3ef/nova/compute/rpcapi.py#L384-L392
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1753443/+subscriptions
Follow ups