yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #93445
[Bug 2052787] [NEW] SSH timeouts due to problems with metadata server in ML2/OVN backend
Public bug reported:
It was visible in couple of jobs already that random tempest scenario jobs are failing due to timeout while SSHing to the guest vm.
In the VM's console log there is clearly problem with reaching metadata server:
2024-02-02 17:37:28.665832 | controller | forked to background, child pid 250
2024-02-02 17:37:28.665857 | controller | OK
2024-02-02 17:37:28.665883 | controller | checking http://169.254.169.254/2009-04-04/instance-id
2024-02-02 17:37:28.665908 | controller | failed 1/20: up 26.07. request failed
2024-02-02 17:37:28.665933 | controller | failed 2/20: up 28.37. request failed
2024-02-02 17:37:28.665958 | controller | failed 3/20: up 30.67. request failed
2024-02-02 17:37:28.665983 | controller | failed 4/20: up 32.96. request failed
2024-02-02 17:37:28.666008 | controller | failed 5/20: up 82.24. request failed
2024-02-02 17:37:28.666033 | controller | failed 6/20: up 131.56. request failed
When looking at the logs of the neutron-ovn-metadata-agent and then journal log it seems for me that those requests are never delivered to the haproxy spawned in the ovnmeta-xxx namespace as there is no any log with the log-tag configured in haproxy for that network.
Examples of failures like that:
https://3c8c3cc132d3ca41c1a0-8df332a8f6cbb54ee498032ff97f9d17.ssl.cf1.rackcdn.com/882350/2/check/cinder-plugin-ceph-tempest-mn-aa/df2995a/job-output.txt
https://ac3deee033df2f80309a-9b1010a8ed0ed23e4a7e66dfa043a295.ssl.cf5.rackcdn.com/907418/2/check/tempest-slow-py3/6dff044/job-output.txt
** Affects: neutron
Importance: Critical
Assignee: Slawek Kaplonski (slaweq)
Status: Confirmed
** Tags: gate-failure tempest
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2052787
Title:
SSH timeouts due to problems with metadata server in ML2/OVN backend
Status in neutron:
Confirmed
Bug description:
It was visible in couple of jobs already that random tempest scenario jobs are failing due to timeout while SSHing to the guest vm.
In the VM's console log there is clearly problem with reaching metadata server:
2024-02-02 17:37:28.665832 | controller | forked to background, child pid 250
2024-02-02 17:37:28.665857 | controller | OK
2024-02-02 17:37:28.665883 | controller | checking http://169.254.169.254/2009-04-04/instance-id
2024-02-02 17:37:28.665908 | controller | failed 1/20: up 26.07. request failed
2024-02-02 17:37:28.665933 | controller | failed 2/20: up 28.37. request failed
2024-02-02 17:37:28.665958 | controller | failed 3/20: up 30.67. request failed
2024-02-02 17:37:28.665983 | controller | failed 4/20: up 32.96. request failed
2024-02-02 17:37:28.666008 | controller | failed 5/20: up 82.24. request failed
2024-02-02 17:37:28.666033 | controller | failed 6/20: up 131.56. request failed
When looking at the logs of the neutron-ovn-metadata-agent and then journal log it seems for me that those requests are never delivered to the haproxy spawned in the ovnmeta-xxx namespace as there is no any log with the log-tag configured in haproxy for that network.
Examples of failures like that:
https://3c8c3cc132d3ca41c1a0-8df332a8f6cbb54ee498032ff97f9d17.ssl.cf1.rackcdn.com/882350/2/check/cinder-plugin-ceph-tempest-mn-aa/df2995a/job-output.txt
https://ac3deee033df2f80309a-9b1010a8ed0ed23e4a7e66dfa043a295.ssl.cf5.rackcdn.com/907418/2/check/tempest-slow-py3/6dff044/job-output.txt
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2052787/+subscriptions