canonical-ubuntu-qa team mailing list archive
-
canonical-ubuntu-qa team
-
Mailing list archive
-
Message #02332
[Bug 2047694] Re: mm:cpuset01 from ubuntu_ltp flaky on scobee-kernel with J-realtime
** Description changed:
It seems we didn't run this test on scobee-kernel with J-realtime
before, so it's a bit difficult to determine if this is caused by the
recent LTP fork update [1].
Test failed with timeout:
INFO: Test start time: Thu Dec 21 09:12:45 UTC 2023
COMMAND: /opt/ltp/bin/ltp-pan -q -e -S -a 244473 -n 244473 -f /tmp/ltp-SeaoDkJ1R1/alltests -l /dev/null -C /dev/null -T /dev/null
LOG File: /dev/null
FAILED COMMAND File: /dev/null
TCONF COMMAND File: /dev/null
Running tests.......
tst_test.c:1690: TINFO: LTP version: 20230929-185-g19ef6521d
tst_test.c:1574: TINFO: Timeout per run is 0h 00m 30s
Test timeouted, sending SIGKILL!
tst_test.c:1622: TINFO: Killed the leftover descendant processes
tst_test.c:1628: TINFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
tst_test.c:1630: TBROK: Test killed! (timeout?)
Summary:
passed 0
failed 0
broken 1
skipped 0
warnings 0
INFO: ltp-pan reported some tests FAIL
LTP Version: 20230929-185-g19ef6521d
INFO: Test end time: Thu Dec 21 09:13:15 UTC 2023
- I tried to test this manually on scobee-kernel, but I found this is a
- bit flaky. In some attempts this test can finish with 10 seconds, but
- sometimes it will take up to 90 seconds.
+ And it looks like this test will trigger a warning on this system even if the test has passed:
+ [ 165.551988] ------------[ cut here ]------------
+ [ 165.552018] WARNING: CPU: 0 PID: 15 at kernel/sched/core.c:3109 set_task_cpu+0x168/0x244
+ [ 165.552083] Modules linked in: binfmt_misc nls_iso8859_1 ipmi_ssif arm_spe_pmu acpi_ipmi hisi_zip ipmi_si hns_roce_hw_v2 hisi_sec2 hisi_hpre ecdh_generic libcurve25519_generic ipmi_devintf ecc hisi_qm ipmi_msghandler authenc uacce hisi_uncore_l3c_pmu hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu hisi_trng_v2 hisi_uncore_pmu cppc_cpufreq sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure mlx5_ib ib_uverbs ib_core hibmc_drm drm_vram_helper drm_ttm_helper ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect mlx5_core sysimgblt fb_sys_fops cec rc_core mlxfw realtek crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce hisi_sas_v3_hw hns3 psample hisi_sas_main hclge tls xhci_pci libsas drm hnae3 xhci_pci_renesas ahci scsi_transport_sas spi_dw_mmio spi_dw gpio_dwapb
+ [ 165.552555] aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
+ [ 165.552595] CPU: 0 PID: 15 Comm: ksoftirqd/0 Not tainted 5.15.0-1052-realtime #58-Ubuntu
+ [ 165.552614] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B160.01 01/15/2020
+ [ 165.552624] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
+ [ 165.552641] pc : set_task_cpu+0x168/0x244
+ [ 165.552660] lr : detach_tasks+0x138/0x4b0
+ [ 165.552684] sp : ffff80000860ba20
+ [ 165.552692] x29: ffff80000860ba20 x28: ffff2020075ac300 x27: ffffa9913540a928
+ [ 165.552718] x26: ffffa99134c794c0 x25: ffffa99134c794c0 x24: ffff003f7fbd9fb0
+ [ 165.552739] x23: 0000000000000001 x22: ffff003f7fbd94c0 x21: ffffa99135407a18
+ [ 165.552761] x20: 000000000000000d x19: ffff2020075ac300 x18: 0000000000000000
+ [ 165.552784] x17: ffff56ae4adda000 x16: ffffa99133a3e780 x15: 00003d094ed85380
+ [ 165.552805] x14: ffffa9913543a5a8 x13: ffffa9913543a078 x12: 000000000000000d
+ [ 165.552828] x11: 0000000000000004 x10: ffffa99135407b50 x9 : ffffa99132e30dd8
+ [ 165.552846] x8 : 000000000000000d x7 : ffffffffffffe000 x6 : 0000000000000314
+ [ 165.552866] x5 : 0000000000532ae2 x4 : 0000000000000001 x3 : 000000000000b67e
+ [ 165.552886] x2 : 0000000000000000 x1 : ffffa991344399b8 x0 : 0000000000000001
+ [ 165.552908] Call trace:
+ [ 165.552915] set_task_cpu+0x168/0x244
+ [ 165.552933] detach_tasks+0x138/0x4b0
+ [ 165.552948] load_balance+0x260/0x834
+ [ 165.552967] rebalance_domains+0x280/0x3f4
+ [ 165.552984] _nohz_idle_balance.constprop.0.isra.0+0x1ec/0x34c
+ [ 165.553004] run_rebalance_domains+0x84/0xb0
+ [ 165.553022] __do_softirq+0x170/0x468
+ [ 165.553035] run_ksoftirqd+0x80/0x150
+ [ 165.553052] smpboot_thread_fn+0x260/0x2e4
+ [ 165.553072] kthread+0x158/0x16c
+ [ 165.553092] ret_from_fork+0x10/0x20
+ [ 165.553117] ---[ end trace 0000000000000002 ]---
+
+ I tried to test this manually on scobee-kernel, but I found this is a bit flaky. In some attempts this test can finish with 10 seconds, but sometimes it will take up to 90 seconds.
Maybe bumping the timeout multiplier can be a possible solution.
[1] https://lists.ubuntu.com/archives/kernel-
team/2023-December/147590.html
** Summary changed:
- mm:cpuset01 from ubuntu_ltp flaky on scobee-kernel with J-realtime
+ mm:cpuset01 from ubuntu_ltp flaky on scobee-kernel with J-realtime (warning found in dmesg)
--
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2047694
Title:
mm:cpuset01 from ubuntu_ltp flaky on scobee-kernel with J-realtime
(warning found in dmesg)
Status in ubuntu-kernel-tests:
New
Bug description:
It seems we didn't run this test on scobee-kernel with J-realtime
before, so it's a bit difficult to determine if this is caused by the
recent LTP fork update [1].
Test failed with timeout:
INFO: Test start time: Thu Dec 21 09:12:45 UTC 2023
COMMAND: /opt/ltp/bin/ltp-pan -q -e -S -a 244473 -n 244473 -f /tmp/ltp-SeaoDkJ1R1/alltests -l /dev/null -C /dev/null -T /dev/null
LOG File: /dev/null
FAILED COMMAND File: /dev/null
TCONF COMMAND File: /dev/null
Running tests.......
tst_test.c:1690: TINFO: LTP version: 20230929-185-g19ef6521d
tst_test.c:1574: TINFO: Timeout per run is 0h 00m 30s
Test timeouted, sending SIGKILL!
tst_test.c:1622: TINFO: Killed the leftover descendant processes
tst_test.c:1628: TINFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
tst_test.c:1630: TBROK: Test killed! (timeout?)
Summary:
passed 0
failed 0
broken 1
skipped 0
warnings 0
INFO: ltp-pan reported some tests FAIL
LTP Version: 20230929-185-g19ef6521d
INFO: Test end time: Thu Dec 21 09:13:15 UTC 2023
And it looks like this test will trigger a warning on this system even if the test has passed:
[ 165.551988] ------------[ cut here ]------------
[ 165.552018] WARNING: CPU: 0 PID: 15 at kernel/sched/core.c:3109 set_task_cpu+0x168/0x244
[ 165.552083] Modules linked in: binfmt_misc nls_iso8859_1 ipmi_ssif arm_spe_pmu acpi_ipmi hisi_zip ipmi_si hns_roce_hw_v2 hisi_sec2 hisi_hpre ecdh_generic libcurve25519_generic ipmi_devintf ecc hisi_qm ipmi_msghandler authenc uacce hisi_uncore_l3c_pmu hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu hisi_trng_v2 hisi_uncore_pmu cppc_cpufreq sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure mlx5_ib ib_uverbs ib_core hibmc_drm drm_vram_helper drm_ttm_helper ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect mlx5_core sysimgblt fb_sys_fops cec rc_core mlxfw realtek crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce hisi_sas_v3_hw hns3 psample hisi_sas_main hclge tls xhci_pci libsas drm hnae3 xhci_pci_renesas ahci scsi_transport_sas spi_dw_mmio spi_dw gpio_dwapb
[ 165.552555] aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[ 165.552595] CPU: 0 PID: 15 Comm: ksoftirqd/0 Not tainted 5.15.0-1052-realtime #58-Ubuntu
[ 165.552614] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B160.01 01/15/2020
[ 165.552624] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 165.552641] pc : set_task_cpu+0x168/0x244
[ 165.552660] lr : detach_tasks+0x138/0x4b0
[ 165.552684] sp : ffff80000860ba20
[ 165.552692] x29: ffff80000860ba20 x28: ffff2020075ac300 x27: ffffa9913540a928
[ 165.552718] x26: ffffa99134c794c0 x25: ffffa99134c794c0 x24: ffff003f7fbd9fb0
[ 165.552739] x23: 0000000000000001 x22: ffff003f7fbd94c0 x21: ffffa99135407a18
[ 165.552761] x20: 000000000000000d x19: ffff2020075ac300 x18: 0000000000000000
[ 165.552784] x17: ffff56ae4adda000 x16: ffffa99133a3e780 x15: 00003d094ed85380
[ 165.552805] x14: ffffa9913543a5a8 x13: ffffa9913543a078 x12: 000000000000000d
[ 165.552828] x11: 0000000000000004 x10: ffffa99135407b50 x9 : ffffa99132e30dd8
[ 165.552846] x8 : 000000000000000d x7 : ffffffffffffe000 x6 : 0000000000000314
[ 165.552866] x5 : 0000000000532ae2 x4 : 0000000000000001 x3 : 000000000000b67e
[ 165.552886] x2 : 0000000000000000 x1 : ffffa991344399b8 x0 : 0000000000000001
[ 165.552908] Call trace:
[ 165.552915] set_task_cpu+0x168/0x244
[ 165.552933] detach_tasks+0x138/0x4b0
[ 165.552948] load_balance+0x260/0x834
[ 165.552967] rebalance_domains+0x280/0x3f4
[ 165.552984] _nohz_idle_balance.constprop.0.isra.0+0x1ec/0x34c
[ 165.553004] run_rebalance_domains+0x84/0xb0
[ 165.553022] __do_softirq+0x170/0x468
[ 165.553035] run_ksoftirqd+0x80/0x150
[ 165.553052] smpboot_thread_fn+0x260/0x2e4
[ 165.553072] kthread+0x158/0x16c
[ 165.553092] ret_from_fork+0x10/0x20
[ 165.553117] ---[ end trace 0000000000000002 ]---
I tried to test this manually on scobee-kernel, but I found this is a bit flaky. In some attempts this test can finish with 10 seconds, but sometimes it will take up to 90 seconds.
Maybe bumping the timeout multiplier can be a possible solution.
[1] https://lists.ubuntu.com/archives/kernel-
team/2023-December/147590.html
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2047694/+subscriptions
References