← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1839515] [NEW] Weird functional test failures hitting neutron API in unrelated resize flows since 8/5

 

Public bug reported:

Noticed here:

https://logs.opendev.org/32/634832/43/check/nova-tox-functional-
py36/d4f3be5/testr_results.html.gz

With this test:

nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled

That's a simple test which disables a service and then asserts there is
a service.update notification, but there is another notification
happening as well:


Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py", line 122, in test_service_disabled
    'uuid': self.service_uuid})
  File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py", line 37, in _verify_notification
    base._verify_notification(sample_file_name, replacements, actual)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/notification_sample_base.py", line 148, in _verify_notification
    self.assertEqual(1, len(fake_notifier.VERSIONED_NOTIFICATIONS))
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 411, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 498, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: 1 != 2

And in the error output, we can see this weird traceback of a resize
revert failure b/c the NeutronFixture isn't being used:

2019-08-07 23:22:23,621 ERROR [nova.network.neutronv2.api] The [neutron] section of your nova configuration file must be configured for authentication with the networking service endpoint. See the networking service install guide for details: https://docs.openstack.org/neutron/latest/install/
2019-08-07 23:22:23,634 ERROR [nova.compute.manager] Setting instance vm_state to ERROR
Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 8656, in _error_out_instance_on_exception
    yield
  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 4830, in _resize_instance
    migration_p)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 2697, in migrate_instance_start
    client = _get_ksa_client(context, admin=True)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 215, in _get_ksa_client
    auth_plugin = _get_auth_plugin(context, admin=admin)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 151, in _get_auth_plugin
    _ADMIN_AUTH = _load_auth_plugin(CONF)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 82, in _load_auth_plugin
    raise neutron_client_exc.Unauthorized(message=err_msg)
neutronclient.common.exceptions.Unauthorized: Unknown auth type: None

According to logstash this started showing up around 8/5:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ERROR%20%5Bnova.network.neutronv2.api%5D%20The%20%5Bneutron%5D%20section%20of%20your%20nova%20configuration%20file%20must%20be%20configured%20for%20authentication%20with%20the%20networking%20service%20endpoint.%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d

Which makes me think this change, which is restarting a compute service
and sleeping in a stub:

https://review.opendev.org/#/c/670393/

Might be screwing up concurrently running tests.

Looking at when that test runs and the ones that fails:

2019-08-07 23:21:54.157918 | ubuntu-bionic | {4}
nova.tests.functional.compute.test_init_host.ComputeManagerInitHostTestCase.test_migrate_disk_and_power_off_crash_finish_revert_migration
[4.063814s] ... ok

2019-08-07 23:25:00.073443 | ubuntu-bionic | {4}
nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled
[160.155643s] ... FAILED

We can see they are on the same worker process and run at about the same
time.

Furthermore, we can see that
TestServiceUpdateNotificationSampleLatest.test_service_disabled
eventually times out after 160 seconds and this is in the error output:

2019-08-07 23:24:59,911 ERROR [nova.compute.api] An error occurred while updating the COMPUTE_STATUS_DISABLED trait on compute node resource providers managed by host host1. The trait will be synchronized automatically by the compute service when the update_available_resource periodic task runs.
Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 5034, in _update_compute_provider_status
    self.rpcapi.set_host_enabled(context, service.host, enabled)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/rpcapi.py", line 996, in set_host_enabled
    return cctxt.call(ctxt, 'set_host_enabled', enabled=enabled)
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call
    transport_options=self.transport_options)
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send
    transport_options=transport_options)
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 224, in send
    transport_options)
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 208, in _send
    reply, failure = reply_q.get(timeout=timeout)
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get
    return waiter.wait()
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait
    return get_hub().switch()
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch
    return self.greenlet.switch()
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 350, in run
    self.wait(sleep_time)
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/poll.py", line 77, in wait
    time.sleep(seconds)
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/fixtures/_fixtures/timeout.py", line 52, in signal_handler
    raise TimeoutException()
fixtures._fixtures.timeout.TimeoutException

So test_migrate_disk_and_power_off_crash_finish_revert_migration is
probably not cleaning up properly.

** Affects: nova
     Importance: High
         Status: Confirmed


** Tags: gate-failure testing

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839515

Title:
  Weird functional test failures hitting neutron API in unrelated resize
  flows since 8/5

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  Noticed here:

  https://logs.opendev.org/32/634832/43/check/nova-tox-functional-
  py36/d4f3be5/testr_results.html.gz

  With this test:

  nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled

  That's a simple test which disables a service and then asserts there
  is a service.update notification, but there is another notification
  happening as well:

  
  Traceback (most recent call last):
    File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py", line 122, in test_service_disabled
      'uuid': self.service_uuid})
    File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_service.py", line 37, in _verify_notification
      base._verify_notification(sample_file_name, replacements, actual)
    File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/notification_sample_base.py", line 148, in _verify_notification
      self.assertEqual(1, len(fake_notifier.VERSIONED_NOTIFICATIONS))
    File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 411, in assertEqual
      self.assertThat(observed, matcher, message)
    File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 498, in assertThat
      raise mismatch_error
  testtools.matchers._impl.MismatchError: 1 != 2

  And in the error output, we can see this weird traceback of a resize
  revert failure b/c the NeutronFixture isn't being used:

  2019-08-07 23:22:23,621 ERROR [nova.network.neutronv2.api] The [neutron] section of your nova configuration file must be configured for authentication with the networking service endpoint. See the networking service install guide for details: https://docs.openstack.org/neutron/latest/install/
  2019-08-07 23:22:23,634 ERROR [nova.compute.manager] Setting instance vm_state to ERROR
  Traceback (most recent call last):
    File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 8656, in _error_out_instance_on_exception
      yield
    File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 4830, in _resize_instance
      migration_p)
    File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 2697, in migrate_instance_start
      client = _get_ksa_client(context, admin=True)
    File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 215, in _get_ksa_client
      auth_plugin = _get_auth_plugin(context, admin=admin)
    File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 151, in _get_auth_plugin
      _ADMIN_AUTH = _load_auth_plugin(CONF)
    File "/home/zuul/src/opendev.org/openstack/nova/nova/network/neutronv2/api.py", line 82, in _load_auth_plugin
      raise neutron_client_exc.Unauthorized(message=err_msg)
  neutronclient.common.exceptions.Unauthorized: Unknown auth type: None

  According to logstash this started showing up around 8/5:

  http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ERROR%20%5Bnova.network.neutronv2.api%5D%20The%20%5Bneutron%5D%20section%20of%20your%20nova%20configuration%20file%20must%20be%20configured%20for%20authentication%20with%20the%20networking%20service%20endpoint.%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d

  Which makes me think this change, which is restarting a compute
  service and sleeping in a stub:

  https://review.opendev.org/#/c/670393/

  Might be screwing up concurrently running tests.

  Looking at when that test runs and the ones that fails:

  2019-08-07 23:21:54.157918 | ubuntu-bionic | {4}
  nova.tests.functional.compute.test_init_host.ComputeManagerInitHostTestCase.test_migrate_disk_and_power_off_crash_finish_revert_migration
  [4.063814s] ... ok

  2019-08-07 23:25:00.073443 | ubuntu-bionic | {4}
  nova.tests.functional.notification_sample_tests.test_service.TestServiceUpdateNotificationSampleLatest.test_service_disabled
  [160.155643s] ... FAILED

  We can see they are on the same worker process and run at about the
  same time.

  Furthermore, we can see that
  TestServiceUpdateNotificationSampleLatest.test_service_disabled
  eventually times out after 160 seconds and this is in the error
  output:

  2019-08-07 23:24:59,911 ERROR [nova.compute.api] An error occurred while updating the COMPUTE_STATUS_DISABLED trait on compute node resource providers managed by host host1. The trait will be synchronized automatically by the compute service when the update_available_resource periodic task runs.
  Traceback (most recent call last):
    File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 5034, in _update_compute_provider_status
      self.rpcapi.set_host_enabled(context, service.host, enabled)
    File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/rpcapi.py", line 996, in set_host_enabled
      return cctxt.call(ctxt, 'set_host_enabled', enabled=enabled)
    File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call
      transport_options=self.transport_options)
    File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send
      transport_options=transport_options)
    File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 224, in send
      transport_options)
    File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 208, in _send
      reply, failure = reply_q.get(timeout=timeout)
    File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get
      return waiter.wait()
    File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait
      return get_hub().switch()
    File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch
      return self.greenlet.switch()
    File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 350, in run
      self.wait(sleep_time)
    File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/poll.py", line 77, in wait
      time.sleep(seconds)
    File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/fixtures/_fixtures/timeout.py", line 52, in signal_handler
      raise TimeoutException()
  fixtures._fixtures.timeout.TimeoutException

  So test_migrate_disk_and_power_off_crash_finish_revert_migration is
  probably not cleaning up properly.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839515/+subscriptions


Follow ups