yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #88990
[Bug 1964940] Re: Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time.
Reviewed: https://review.opendev.org/c/openstack/neutron/+/843426
Committed: https://opendev.org/openstack/neutron/commit/e6d27be4747eb4573dcc5c0e1e7ac7550d20f951
Submitter: "Zuul (22348)"
Branch: master
commit e6d27be4747eb4573dcc5c0e1e7ac7550d20f951
Author: yatinkarel <ykarel@xxxxxxxxxx>
Date: Thu May 26 14:57:48 2022 +0530
Revert "Use Port_Binding up column to set Neutron port status"
This reverts commit 37d4195b516f12b683b774f0561561b172dd15c6.
Conflicts:
neutron/common/ovn/constants.py
neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py
Also revert below 2 commits which were added on
top of the parent commit:-
Revert "Ensure subports transition to DOWN"
This reverts commit 5e036a6b281e4331f396473e299b26b2537d5322.
Revert "Ensure only the right events are processed"
This reverts commit 553f462656c2b7ee1e9be6b1e4e7c446c12cc9aa.
Reason for revert: These patches have caused couple of issues[1][2][3].
[1][2] are same issue just one is seen in c8/c9-stream and other in
rhel8 and both contains much info about the issue.
[3] is currently happening only in rhel8/rhel9 as this issue is visible
only with the patch in revert and ovn-2021>=21.12.0-55(fix of [4]) which
is not yet available in c8/c9-stream.
[1][2] happens randomly as the patch under revert has moved the
events to SB DB which made a known OVN issue[5] occur more often as in
that issue SB DB Event queue floods with too many events of
PortBindingChassisEvent making other events like PortBindingUpdateUpEvent
to wait much longer and hence triggering VirtualInterfaceCreateException.
NB DB Event queue is different and hence with revert we are trying to
lower the side effect of the OVN issue[5].
This patch can be re reverted once [3] and [5] are fixed.
[1] https://bugs.launchpad.net/tripleo/+bug/1964940/
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2081631
[3] https://bugzilla.redhat.com/show_bug.cgi?id=2090604
[4] https://bugzilla.redhat.com/show_bug.cgi?id=2037433
[5] https://bugzilla.redhat.com/show_bug.cgi?id=1974898
Closes-Bug: #1964940
Closes-Bug: rhbz#2081631
Closes-Bug: rhbz#2090604
Related-Bug: rhbz#2037433
Related-Bug: rhbz#1974898
Change-Id: I159460be27f2c5f105be4b2865ef84aeb9a00094
** Changed in: neutron
Status: In Progress => Fix Released
** Bug watch added: Red Hat Bugzilla #2090604
https://bugzilla.redhat.com/show_bug.cgi?id=2090604
** Bug watch added: Red Hat Bugzilla #2037433
https://bugzilla.redhat.com/show_bug.cgi?id=2037433
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1964940
Title:
Compute tests are failing with failed to reach ACTIVE status and task
state "None" within the required time.
Status in neutron:
Fix Released
Status in tripleo:
In Progress
Bug description:
On Fs001 CentOS Stream 9 wallaby, Multiple compute server tempest tests are failing with following error [1][2]:
```
{1} tempest.api.compute.images.test_images.ImagesTestJSON.test_create_image_from_paused_server [335.060967s] ... FAILED
Captured traceback:
~~~~~~~~~~~~~~~~~~~
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/tempest/api/compute/images/test_images.py", line 99, in test_create_image_from_paused_server
server = self.create_test_server(wait_until='ACTIVE')
File "/usr/lib/python3.9/site-packages/tempest/api/compute/base.py", line 270, in create_test_server
body, servers = compute.create_test_server(
File "/usr/lib/python3.9/site-packages/tempest/common/compute.py", line 267, in create_test_server
LOG.exception('Server %s failed to delete in time',
File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
self.force_reraise()
File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
raise self.value
File "/usr/lib/python3.9/site-packages/tempest/common/compute.py", line 237, in create_test_server
waiters.wait_for_server_status(
File "/usr/lib/python3.9/site-packages/tempest/common/waiters.py", line 100, in wait_for_server_status
raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (ImagesTestJSON:test_create_image_from_paused_server) Server 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1 failed to reach ACTIVE status and task state "None" within the required time (300 s). Server boot request ID: req-4930f047-7f5f-4d08-9ebb-8ac99b29ad7b. Current status: BUILD. Current task state: spawning.
```
Below is the list of other tempest tests failing on the same job.[2]
```
tempest.api.compute.images.test_images.ImagesTestJSON.test_create_image_from_paused_server[id-71bcb732-0261-11e7-9086-fa163e4fa634]
tempest.api.compute.admin.test_volume.AttachSCSIVolumeTestJSON.test_attach_scsi_disk_with_config_drive[id-777e468f-17ca-4da4-b93d-b7dbf56c0494]
tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_attached_volume[id-d0f3f0d6-d9b6-4a32-8da4-23015dcab23c,volume]
tempest.api.compute.servers.test_attach_interfaces.AttachInterfacesV270Test.test_create_get_list_interfaces[id-2853f095-8277-4067-92bd-9f10bd4f8e0c,network]
tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_shelved_state[id-bb0cb402-09dd-4947-b6e5-5e7e1cfa61ad]
setUpClass (tempest.api.compute.images.test_images_oneserver_negative.ImagesOneServerNegativeTestJSON)
tempest.api.compute.servers.test_device_tagging.TaggedBootDevicesTest_v242.test_tagged_boot_devices[id-a2e65a6c-66f1-4442-aaa8-498c31778d96,image,network,slow,volume]
tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_suspended_state[id-1f82ebd3-8253-4f4e-b93f-de9b7df56d8b]
tempest.api.compute.servers.test_attach_interfaces.AttachInterfacesTestJSON.test_create_list_show_delete_interfaces_by_network_port[id-73fe8f02-590d-4bf1-b184-e9ca81065051,network]
setUpClass (tempest.api.compute.servers.test_server_rescue.ServerRescueTestJSONUnderV235)
```
Here is the traceback from nova-compute logs [3],
```
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [req-4930f047-7f5f-4d08-9ebb-8ac99b29ad7b d5ea6c724785473b8ea1104d70fb0d14 64c7d31d84284a28bc9aaa4eaad2b9fb - default default] [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] Instance failed to spawn: nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] Traceback (most recent call last):
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7231, in _create_guest_with_network
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] guest = self._create_guest(
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] File "/usr/lib64/python3.9/contextlib.py", line 126, in __exit__
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] next(self.gen)
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 479, in wait_for_instance_event
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] actual_event = event.wait()
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] File "/usr/lib/python3.9/site-packages/eventlet/event.py", line 125, in wait
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] result = hub.switch()
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] File "/usr/lib/python3.9/site-packages/eventlet/hubs/hub.py", line 313, in switch
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] return self.greenlet.switch()
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] eventlet.timeout.Timeout: 300 seconds
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1]
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] During handling of the above exception, another exception occurred:
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1]
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] Traceback (most recent call last):
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 2640, in _build_resources
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] yield resources
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 2409, in _build_and_run_instance
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] self.driver.spawn(context, instance, image_meta,
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 4193, in spawn
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] self._create_guest_with_network(
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7257, in _create_guest_with_network
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] raise exception.VirtualInterfaceCreateException()
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2022-03-15 09:05:39.011 2 ERROR nova.compute.manager [instance: 6d1d8906-46fd-42ad-8b4e-0f89adb25ed1]
```
This job https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-wallaby is broken from 13th Mar, 2021 and earlier
https://bugs.launchpad.net/tripleo/+bug/1960310 is also seen on this.
Since we have two runs having same tests failures, so logging the bug
for further investigation.
Logs:
[1]. https://logserver.rdoproject.org/17/40517/1/check/periodic-
tripleo-ci-centos-9-ovb-3ctlr_1comp-
featureset001-wallaby/94e16ac/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz
[2]. https://logserver.rdoproject.org/40/40440/1/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-wallaby/6ce8796/logs/undercloud/var/log/tempest/failing_tests.log.txt.gz
[3]. https://logserver.rdoproject.org/17/40517/1/check/periodic-
tripleo-ci-centos-9-ovb-3ctlr_1comp-
featureset001-wallaby/94e16ac/logs/undercloud/var/log/tempest/failing_tests.log.txt.gz
[4]. https://logserver.rdoproject.org/17/40517/1/check/periodic-
tripleo-ci-centos-9-ovb-3ctlr_1comp-
featureset001-wallaby/94e16ac/logs/overcloud-
novacompute-0/var/log/containers/nova/nova-compute.log.1.gz
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1964940/+subscriptions