yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79611
[Bug 1839515] Re: Weird functional test failures hitting neutron API in unrelated resize flows since 8/5
Reviewed: https://review.opendev.org/675553
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f875c9d12fa03e1eb0d6ac0f3dd95f502ae1e6a1
Submitter: Zuul
Branch: master
commit f875c9d12fa03e1eb0d6ac0f3dd95f502ae1e6a1
Author: Balazs Gibizer <balazs.gibizer@xxxxxxxx>
Date: Fri Aug 9 09:53:45 2019 +0200
Prevent init_host test to interfere with other tests
The test_migrate_disk_and_power_off_crash_finish_revert_migration test
needs to simulate a compute host crash at a certain point. It stops the
execution at a certain point by injecting a sleep then simulating a
compute restart. However the sleep is just 30 seconds which allows the
stopped function to return while other functional tests are running in
the same test worker process making those tests fail in a weird way.
One simple solution is to add a big enough sleep to the test that will
never return before the whole functional test execution. This patch
proposes a million seconds which is more than 277 hours. Similar to how
the other test in this test package works. This solution is hacky but
simple. A better solution would be to further enhance the capabilities
of the functional test env supporting nova-compute service crash / kill
+ restart.
Change-Id: Ib0d142806804e9113dd61d3a7ec15a98232775c8
Closes-Bug: #1839515
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839515
Title:
Weird functional test failures hitting neutron API in unrelated resize
flows since 8/5
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Noticed here:
https://logs.opendev.org/32/634832/43/check/nova-tox-functional-
py36/d4f3be5/testr_results.html.gz
With this test:
nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled
That's a simple test which disables a service and then asserts there
is a service.update notification, but there is another notification
happening as well:
Traceback (most recent call last):
File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py", line 122, in test_service_disabled
'uuid': self.service_uuid})
File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py", line 37, in _verify_notification
base._verify_notification(sample_file_name, replacements, actual)
File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/notification_sample_base.py", line 148, in _verify_notification
self.assertEqual(1, len(fake_notifier.VERSIONED_NOTIFICATIONS))
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 411, in assertEqual
self.assertThat(observed, matcher, message)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 498, in assertThat
raise mismatch_error
testtools.matchers._impl.MismatchError: 1 != 2
And in the error output, we can see this weird traceback of a resize
revert failure b/c the NeutronFixture isn't being used:
2019-08-07 23:22:23,621 ERROR [nova.network.neutronv2.api] The [neutron] section of your nova configuration file must be configured for authentication with the networking service endpoint. See the networking service install guide for details: https://docs.openstack.org/neutron/latest/install/
2019-08-07 23:22:23,634 ERROR [nova.compute.manager] Setting instance vm_state to ERROR
Traceback (most recent call last):
File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 8656, in _error_out_instance_on_exception
yield
File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 4830, in _resize_instance
migration_p)
File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 2697, in migrate_instance_start
client = _get_ksa_client(context, admin=True)
File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 215, in _get_ksa_client
auth_plugin = _get_auth_plugin(context, admin=admin)
File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 151, in _get_auth_plugin
_ADMIN_AUTH = _load_auth_plugin(CONF)
File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 82, in _load_auth_plugin
raise neutron_client_exc.Unauthorized(message=err_msg)
neutronclient.common.exceptions.Unauthorized: Unknown auth type: None
According to logstash this started showing up around 8/5:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ERROR%20%5Bnova.network.neutronv2.api%5D%20The%20%5Bneutron%5D%20section%20of%20your%20nova%20configuration%20file%20must%20be%20configured%20for%20authentication%20with%20the%20networking%20service%20endpoint.%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d
Which makes me think this change, which is restarting a compute
service and sleeping in a stub:
https://review.opendev.org/#/c/670393/
Might be screwing up concurrently running tests.
Looking at when that test runs and the ones that fails:
2019-08-07 23:21:54.157918 | ubuntu-bionic | {4}
nova.tests.functional.compute.test_init_host.ComputeManagerInitHostTestCase.test_migrate_disk_and_power_off_crash_finish_revert_migration
[4.063814s] ... ok
2019-08-07 23:25:00.073443 | ubuntu-bionic | {4}
nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled
[160.155643s] ... FAILED
We can see they are on the same worker process and run at about the
same time.
Furthermore, we can see that
TestServiceUpdateNotificationSampleLatest.test_service_disabled
eventually times out after 160 seconds and this is in the error
output:
2019-08-07 23:24:59,911 ERROR [nova.compute.api] An error occurred while updating the COMPUTE_STATUS_DISABLED trait on compute node resource providers managed by host host1. The trait will be synchronized automatically by the compute service when the update_available_resource periodic task runs.
Traceback (most recent call last):
File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 5034, in _update_compute_provider_status
self.rpcapi.set_host_enabled(context, service.host, enabled)
File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/rpcapi.py", line 996, in set_host_enabled
return cctxt.call(ctxt, 'set_host_enabled', enabled=enabled)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call
transport_options=self.transport_options)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send
transport_options=transport_options)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 224, in send
transport_options)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 208, in _send
reply, failure = reply_q.get(timeout=timeout)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get
return waiter.wait()
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait
return get_hub().switch()
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch
return self.greenlet.switch()
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 350, in run
self.wait(sleep_time)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/poll.py", line 77, in wait
time.sleep(seconds)
File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/fixtures/_fixtures/timeout.py", line 52, in signal_handler
raise TimeoutException()
fixtures._fixtures.timeout.TimeoutException
So test_migrate_disk_and_power_off_crash_finish_revert_migration is
probably not cleaning up properly.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839515/+subscriptions
References