canonical-ubuntu-qa team mailing list archive
-
canonical-ubuntu-qa team
-
Mailing list archive
-
Message #05377
[Bug 2076241] Re: ubuntu_ltp_* tests unable to finish properly with B-azure-fips
Investigation shows this is because the command we use to install the openssh-server package from fipsdevppa:
DEBIAN_FRONTEND=noninteractive UCF_FORCE_CONFFNEW=1 apt-get install --yes --allow-downgrades libssl1.1 libssl1.1-hmac openssh-server openssh-server-hmac libssl-dev
This will overwrite the existing config file with the one from the
package, the "ClientAliveInterval 120" setting from /etc/ssh/sshd_config
(shipped with our cloud image) will be gone and consequently causing
this session timeout issue here.
New releases (Focal+) is not affected as we have this setting written
into a file under /etc/ssh/sshd_config.d/.
--
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2076241
Title:
ubuntu_ltp_* tests unable to finish properly with B-azure-fips
Status in ubuntu-kernel-tests:
In Progress
Bug description:
In sru-s20240429 and sru-s20240610, the ubuntu_ltp_* tests were found
unable to finish properly with B-azure-fips kernel, and eventually
trigger the `sut-test` failure on them.
Here is the result from sru-s20240610
* ubuntu_ltp
- report cuts-off at fs:fs_fill test, failed on Standard_D4_v4 only.
* ubuntu_ltp_controllers
- report cuts-off at memcg_test_3 test, failed on Standard_B1ms
- report cuts-off at memcg_stress test, failed on Standard_D4_v4, Standard_D4s_v3-gen2
* ubuntu_ltp_cve
- report cuts-off at cve-2016-8655 test, failed on Standard_B1ms, Standard_D4_v4
- report cuts-off at cve-2018-18559 test, failed on Standard_D4s_v3-gen2
* ubunut_ltp_syscall
- report cuts-off at setsockopt06 test, failed on Standard_B1ms
- report cuts-off at bind06 test, failed on Standard_D4_v4, Standard_D4s_v3-gen2
The result from sru-s20240610 is quite similar, just the
ubuntu_ltp_cve this time cuts-off at cve-2016-8655 test on
Standard_D4s_v3-gen2.
Note that the cve-2016-8655 is actually the setsockopt06 test, and
cve-2018-18559 is the bind06 test.
I have done some experiments on Standard_D4s_v3-gen2 with kernel in sru-s20240610 (4.15.0-2088-azure-fips):
* ubunut_ltp_controllers:
- If we skip memcg_stress test, it will be able to finish properly.
* ubuntu_ltp_cve:
- If we skip cve-2016-8655 and cve-2018-18559 tests, it will be able to finish properly.
* ubuntu_ltp_syscalls:
- If we skip bind06 and writev03 tests, it will be able to finish properly (setsockopt06 works fine in this case, not sure why).
Here is the code to skip a certain test:
diff --git a/ubuntu_ltp_syscalls/control b/ubuntu_ltp_syscalls/control
index 4f93c546..684a8ed2 100644
--- a/ubuntu_ltp_syscalls/control
+++ b/ubuntu_ltp_syscalls/control
@@ -24,6 +24,9 @@ if result == 'GOOD':
# Special case for msgstress04 (lp:1943802 / lp:1943652)
if testcase == 'msgstress04':
timeout_threshold = 60*60
+ if testcase in ['bind06', 'writev03'] and platform.release() == '4.15.0-2088-azure-fips':
+ print('skipping bind06 for testing purpose')
+ continue
job.run_test_detail(NAME, test_name=testcase, tag=testcase, timeout=timeout_threshold)
else:
print("ERROR: test failed to build, skipping all the sub tests")
With my manual test on Standard_D4_v4 with 4.15.0-2088-azure-fips, I noticed that my idle SSH session will hang after a certain period (I recorded one at about 7m21s). If it's running something, like htop, it will be fine.
And setsockopt06, bind06 test can pass without any immediate crash. Not sure what is the cause of this failure that we see here.
It's also worthy to note that "running something" seems to limited to
commands that will keep generating output. Commands like "dmesg -w"
and "tail -f /var/log/syslog" will hang too if there is no output to
update.
According to Magali, the last bionic fips openssh update is from
January, so this might be something else in the kernel.
== Original bug report ==
On azure-fips platforms multiple tests in ubuntu_ltp, ubuntu_ltp_controllers, ubuntu_ltp_cve, and ubuntu_ltp_syscalls are causing the system to be unresponsive. When running locally the tests run to completion but the system hangs sometime after.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2076241/+subscriptions
References