yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79558
[Bug 1839515] [NEW] Weird functional test failures hitting neutron API in unrelated resize flows since 8/5
Public bug reported:
Noticed here:
https://logs.opendev.org/32/634832/43/check/nova-tox-functional-
py36/d4f3be5/testr_results.html.gz
With this test:
nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled
That's a simple test which disables a service and then asserts there is
a service.update notification, but there is another notification
happening as well:
Traceback (most recent call last):
File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py", line 122, in test_service_disabled
'uuid': self.service_uuid})
File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py", line 37, in _verify_notification
base._verify_notification(sample_file_name, replacements, actual)
File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/notification_sample_base.py", line 148, in _verify_notification
self.assertEqual(1, len(fake_notifier.VERSIONED_NOTIFICATIONS))
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 411, in assertEqual
self.assertThat(observed, matcher, message)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 498, in assertThat
raise mismatch_error
testtools.matchers._impl.MismatchError: 1 != 2
And in the error output, we can see this weird traceback of a resize
revert failure b/c the NeutronFixture isn't being used:
2019-08-07 23:22:23,621 ERROR [nova.network.neutronv2.api] The [neutron] section of your nova configuration file must be configured for authentication with the networking service endpoint. See the networking service install guide for details: https://docs.openstack.org/neutron/latest/install/
2019-08-07 23:22:23,634 ERROR [nova.compute.manager] Setting instance vm_state to ERROR
Traceback (most recent call last):
File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 8656, in _error_out_instance_on_exception
yield
File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 4830, in _resize_instance
migration_p)
File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 2697, in migrate_instance_start
client = _get_ksa_client(context, admin=True)
File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 215, in _get_ksa_client
auth_plugin = _get_auth_plugin(context, admin=admin)
File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 151, in _get_auth_plugin
_ADMIN_AUTH = _load_auth_plugin(CONF)
File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 82, in _load_auth_plugin
raise neutron_client_exc.Unauthorized(message=err_msg)
neutronclient.common.exceptions.Unauthorized: Unknown auth type: None
According to logstash this started showing up around 8/5:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ERROR%20%5Bnova.network.neutronv2.api%5D%20The%20%5Bneutron%5D%20section%20of%20your%20nova%20configuration%20file%20must%20be%20configured%20for%20authentication%20with%20the%20networking%20service%20endpoint.%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d
Which makes me think this change, which is restarting a compute service
and sleeping in a stub:
https://review.opendev.org/#/c/670393/
Might be screwing up concurrently running tests.
Looking at when that test runs and the ones that fails:
2019-08-07 23:21:54.157918 | ubuntu-bionic | {4}
nova.tests.functional.compute.test_init_host.ComputeManagerInitHostTestCase.test_migrate_disk_and_power_off_crash_finish_revert_migration
[4.063814s] ... ok
2019-08-07 23:25:00.073443 | ubuntu-bionic | {4}
nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled
[160.155643s] ... FAILED
We can see they are on the same worker process and run at about the same
time.
Furthermore, we can see that
TestServiceUpdateNotificationSampleLatest.test_service_disabled
eventually times out after 160 seconds and this is in the error output:
2019-08-07 23:24:59,911 ERROR [nova.compute.api] An error occurred while updating the COMPUTE_STATUS_DISABLED trait on compute node resource providers managed by host host1. The trait will be synchronized automatically by the compute service when the update_available_resource periodic task runs.
Traceback (most recent call last):
File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 5034, in _update_compute_provider_status
self.rpcapi.set_host_enabled(context, service.host, enabled)
File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/rpcapi.py", line 996, in set_host_enabled
return cctxt.call(ctxt, 'set_host_enabled', enabled=enabled)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call
transport_options=self.transport_options)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send
transport_options=transport_options)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 224, in send
transport_options)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 208, in _send
reply, failure = reply_q.get(timeout=timeout)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get
return waiter.wait()
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait
return get_hub().switch()
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch
return self.greenlet.switch()
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 350, in run
self.wait(sleep_time)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/poll.py", line 77, in wait
time.sleep(seconds)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/fixtures/_fixtures/timeout.py", line 52, in signal_handler
raise TimeoutException()
fixtures._fixtures.timeout.TimeoutException
So test_migrate_disk_and_power_off_crash_finish_revert_migration is
probably not cleaning up properly.
** Affects: nova
Importance: High
Status: Confirmed
** Tags: gate-failure testing
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839515
Title:
Weird functional test failures hitting neutron API in unrelated resize
flows since 8/5
Status in OpenStack Compute (nova):
Confirmed
Bug description:
Noticed here:
https://logs.opendev.org/32/634832/43/check/nova-tox-functional-
py36/d4f3be5/testr_results.html.gz
With this test:
nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled
That's a simple test which disables a service and then asserts there
is a service.update notification, but there is another notification
happening as well:
Traceback (most recent call last):
File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py", line 122, in test_service_disabled
'uuid': self.service_uuid})
File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py", line 37, in _verify_notification
base._verify_notification(sample_file_name, replacements, actual)
File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/notification_sample_base.py", line 148, in _verify_notification
self.assertEqual(1, len(fake_notifier.VERSIONED_NOTIFICATIONS))
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 411, in assertEqual
self.assertThat(observed, matcher, message)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 498, in assertThat
raise mismatch_error
testtools.matchers._impl.MismatchError: 1 != 2
And in the error output, we can see this weird traceback of a resize
revert failure b/c the NeutronFixture isn't being used:
2019-08-07 23:22:23,621 ERROR [nova.network.neutronv2.api] The [neutron] section of your nova configuration file must be configured for authentication with the networking service endpoint. See the networking service install guide for details: https://docs.openstack.org/neutron/latest/install/
2019-08-07 23:22:23,634 ERROR [nova.compute.manager] Setting instance vm_state to ERROR
Traceback (most recent call last):
File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 8656, in _error_out_instance_on_exception
yield
File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 4830, in _resize_instance
migration_p)
File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 2697, in migrate_instance_start
client = _get_ksa_client(context, admin=True)
File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 215, in _get_ksa_client
auth_plugin = _get_auth_plugin(context, admin=admin)
File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 151, in _get_auth_plugin
_ADMIN_AUTH = _load_auth_plugin(CONF)
File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 82, in _load_auth_plugin
raise neutron_client_exc.Unauthorized(message=err_msg)
neutronclient.common.exceptions.Unauthorized: Unknown auth type: None
According to logstash this started showing up around 8/5:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ERROR%20%5Bnova.network.neutronv2.api%5D%20The%20%5Bneutron%5D%20section%20of%20your%20nova%20configuration%20file%20must%20be%20configured%20for%20authentication%20with%20the%20networking%20service%20endpoint.%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d
Which makes me think this change, which is restarting a compute
service and sleeping in a stub:
https://review.opendev.org/#/c/670393/
Might be screwing up concurrently running tests.
Looking at when that test runs and the ones that fails:
2019-08-07 23:21:54.157918 | ubuntu-bionic | {4}
nova.tests.functional.compute.test_init_host.ComputeManagerInitHostTestCase.test_migrate_disk_and_power_off_crash_finish_revert_migration
[4.063814s] ... ok
2019-08-07 23:25:00.073443 | ubuntu-bionic | {4}
nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled
[160.155643s] ... FAILED
We can see they are on the same worker process and run at about the
same time.
Furthermore, we can see that
TestServiceUpdateNotificationSampleLatest.test_service_disabled
eventually times out after 160 seconds and this is in the error
output:
2019-08-07 23:24:59,911 ERROR [nova.compute.api] An error occurred while updating the COMPUTE_STATUS_DISABLED trait on compute node resource providers managed by host host1. The trait will be synchronized automatically by the compute service when the update_available_resource periodic task runs.
Traceback (most recent call last):
File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 5034, in _update_compute_provider_status
self.rpcapi.set_host_enabled(context, service.host, enabled)
File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/rpcapi.py", line 996, in set_host_enabled
return cctxt.call(ctxt, 'set_host_enabled', enabled=enabled)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call
transport_options=self.transport_options)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send
transport_options=transport_options)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 224, in send
transport_options)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 208, in _send
reply, failure = reply_q.get(timeout=timeout)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get
return waiter.wait()
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait
return get_hub().switch()
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch
return self.greenlet.switch()
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 350, in run
self.wait(sleep_time)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/poll.py", line 77, in wait
time.sleep(seconds)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/fixtures/_fixtures/timeout.py", line 52, in signal_handler
raise TimeoutException()
fixtures._fixtures.timeout.TimeoutException
So test_migrate_disk_and_power_off_crash_finish_revert_migration is
probably not cleaning up properly.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839515/+subscriptions
Follow ups