yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #91300
[Bug 2006467] Re: tempest ssh timeout due to udhcpc fails in the cirros guest
Okay, I did a bit of digging today for some other CI failure I saw on
another change and eventually, I found this was related.
So, lemme explain the issue here. First, I was looking at
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6f9/868236/5/gate/nova-
next/6f9f3d0/ and I was wondering why the SSH connection wasn't working.
When I looked at the nova logs, I found that the instance was spawned at 18:18:56 :
Feb 14 18:18:56.514945 np0033093378 nova-compute[83239]: INFO nova.compute.manager [None req-053318ab-09ad-4a3a-8ddb-633cc0002c3e tempest-AttachVolumeNegativeTest-1605485622 tempest-AttachVolumeNegativeTest-1605485622-project] [instance: 6a265379-ebfd-4aea-a081-8b271f32c0ea] Took 8.58 seconds to build instance.
Then, Tempest tried to ssh the instance at 18:18:59 :
2023-02-14 18:22:39.102680 | controller | 2023-02-14 18:18:59,630 92653 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.161:22' as 'cirros' with public key authentication
And eventually, 2mins32sec after that (18:22:31), it stopped :
2023-02-14 18:22:39.103394 | controller | 2023-02-14 18:22:31,398 92653 ERROR [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.161 after 16 attempts. Proxy client: no proxy client
Then, I tried to look at the guest console, and I saw that udhcpc tried 3 times :
2023-02-14 18:22:39.129636 | controller | [ 12.638156] sr 0:0:0:0: Attached scsi generic sg0 type 5
[...]
2023-02-14 18:22:39.130384 | controller | Starting network: udhcpc: started, v1.29.3
2023-02-14 18:22:39.130415 | controller | udhcpc: sending discover
2023-02-14 18:22:39.130439 | controller | udhcpc: sending discover
2023-02-14 18:22:39.130461 | controller | udhcpc: sending discover
So, I was wondering how long the DHCP discovery was done and eventually, I found that cirros dhcp client actually hangs for 1 min before requesting again.
So, now I'm wondering why it takes so much time to get a DHCP address
and why the 2nd DHCP call doesn't get the IP address.
Adding Neutron team to this bug report because maybe we have something
about our DHCP controller.
** Also affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2006467
Title:
tempest ssh timeout due to udhcpc fails in the cirros guest
Status in neutron:
New
Status in OpenStack Compute (nova):
Confirmed
Bug description:
Tests trying to ssh into the guest fails intermittently with timeout
as udhcpc fails in the guest:
2023-02-01 20:46:32.286979 | controller | Starting network: udhcpc:
started, v1.29.3
2023-02-01 20:46:32.286987 | controller | udhcp
2023-02-01 20:46:32.286996 | controller | c: sending discover
2023-02-01 20:46:32.287004 | controller | udhcpc: sending discover
2023-02-01 20:46:32.287013 | controller | udhcpc: sending discover
2023-02-01 20:46:32.287022 | controller | Usage: /sbin/cirros-dhcpc
<up|down>
2023-02-01 20:46:32.287030 | controller | udhcpc: no lease, failing
2023-02-01 20:46:32.287039 | controller | FAIL
Traceback (most recent call last):
File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in wrapper
return f(*func_args, **func_kwargs)
File "/opt/stack/tempest/tempest/api/compute/admin/test_volumes_negative.py", line 128, in test_multiattach_rw_volume_update_failure
server1 = self.create_test_server(
File "/opt/stack/tempest/tempest/api/compute/base.py", line 272, in create_test_server
body, servers = compute.create_test_server(
File "/opt/stack/tempest/tempest/common/compute.py", line 334, in create_test_server
with excutils.save_and_reraise_exception():
File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/oslo_utils/excutils.py", line 227, in __exit__
self.force_reraise()
File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
raise self.value
File "/opt/stack/tempest/tempest/common/compute.py", line 329, in create_test_server
wait_for_ssh_or_ping(
File "/opt/stack/tempest/tempest/common/compute.py", line 148, in wait_for_ssh_or_ping
waiters.wait_for_ssh(
File "/opt/stack/tempest/tempest/common/waiters.py", line 632, in wait_for_ssh
raise lib_exc.TimeoutException()
tempest.lib.exceptions.TimeoutException: Request timed out
Details: None
Example failure
https://zuul.opendev.org/t/openstack/build/f1c6b7e54b28415c952de0be833731a9/logs
Signature
$ logsearch log --job-group nova-devstack --result FAILURE 'udhcpc: no lease, failing' --days 7
[snip]
Builds with matching logs 6/138:
+----------------------------------+---------------------+----------------+----------+-----------------------------------+----------------+--------------------------------------------+
| uuid | finished | project | pipeline | review | branch | job |
+----------------------------------+---------------------+----------------+----------+-----------------------------------+----------------+--------------------------------------------+
| 9bd5d568bfa84c119470df9fbff2de0b | 2023-02-03T12:36:54 | openstack/nova | check | https://review.opendev.org/857339 | master | nova-next |
| 3fae6edffe68483fa2627bc40002f524 | 2023-02-02T13:52:04 | openstack/nova | check | https://review.opendev.org/860285 | master | nova-next |
| 70eeeb8eb3184d8d9ee802ee53cb979b | 2023-02-02T13:33:57 | openstack/nova | check | https://review.opendev.org/860287 | master | nova-next |
| 492821b715974ae389c5d7f9127bb5c3 | 2023-02-02T05:14:11 | openstack/nova | check | https://review.opendev.org/871798 | stable/wallaby | tempest-integrated-compute-centos-8-stream |
| f1c6b7e54b28415c952de0be833731a9 | 2023-02-01T21:34:36 | openstack/nova | gate | https://review.opendev.org/872220 | master | nova-next |
| cca45d74a56f4204a299ee4bbbaad59d | 2023-02-01T06:17:04 | openstack/nova | check | https://review.opendev.org/871557 | stable/wallaby | tempest-integrated-compute-centos-8-stream |
+----------------------------------+---------------------+----------------+----------+-----------------------------------+----------------+--------------------------------------------+
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2006467/+subscriptions
References