canonical-ubuntu-qa team mailing list archive
-
canonical-ubuntu-qa team
-
Mailing list archive
-
Message #04253
[Bug 2068024] Re: race_sched in ubuntu_stress_smoke_test will cause kernel panic on 6.8 with Azure Standard_A2_v2 instance
This issue was fixed by the following commit (upstream as of the 6.9
kernel):
1560d1f6eb6b398bddd80c16676776c0325fe5fe "sched/eevdf: Prevent vlag from
going out of bounds in reweight_eevdf()"
I've sent the patches to the mailing list for noble:linux
(https://lists.ubuntu.com/archives/kernel-team/2024-June/151360.html). I
left out other derivatives as they'll get them from noble:linux.
Oracular is tracking past the 6.9 kernel, so these patches should
already be applied there.
I've also attached the bisect logs for the break and fix commits, as
well as the script used to test (along with a patch to speed up
testing).
** Attachment added: "bisect log identifying break commit"
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068024/+attachment/5787032/+files/break_commit_bisect.log
--
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2068024
Title:
race_sched in ubuntu_stress_smoke_test will cause kernel panic on 6.8
with Azure Standard_A2_v2 instance
Status in ubuntu-kernel-tests:
New
Status in linux package in Ubuntu:
New
Status in linux source package in Noble:
In Progress
Bug description:
This issue can be found on:
* N-Azure-6.8.0-1008.8
* N-geneirc-6.8.0-35.35
* J-Azure-6.8.0-1008.8~22.04.1
With 100% reproduced rate on Azure Standard_A2_v2 instance, (reproduce
rate 100%), it can be found on Standard_D2pds_v5 as well, but with a
lower reproduce rate.
syslog output:
2024-06-04T12:21:29.655736+00:00 n-laz-az-6-8-stda2v2-u-stress-smk-test kernel: zswap: loaded using pool lzo/zbud
2024-06-04T12:21:29.727437+00:00 n-laz-az-6-8-stda2v2-u-stress-smk-test stress-ng: invoked with './stress-ng -v -t 5 --race-sched 4 --race-sched-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable' by user 0 'root'
2024-06-04T12:21:29.727600+00:00 n-laz-az-6-8-stda2v2-u-stress-smk-test stress-ng: system: 'n-laz-az-6-8-stda2v2-u-stress-smk-test' Linux 6.8.0-1001-azure #1-Ubuntu SMP Tue Feb 13 17:53:47 UTC 2024 x86_64
2024-06-04T12:21:29.727683+00:00 n-laz-az-6-8-stda2v2-u-stress-smk-test stress-ng: memory (MB): total 3918.72, free 3424.57, shared 4.08, buffer 36.20, swap 0.00, free swap 0.00
2024-06-04T12:21:29.727723+00:00 n-laz-az-6-8-stda2v2-u-stress-smk-test stress-ng: stress-ng: info: [1250] setting to a 5 secs run per stressor
2024-06-04T12:21:29.805799+00:00 n-laz-az-6-8-stda2v2-u-stress-smk-test stress-ng: stress-ng: info: [1250] dispatching hogs: 4 race-sched
Console output:
[ 1167.163045] I/O error, dev loop0, sector 256 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 1435.517597] BUG: kernel NULL pointer dereference, address: 00000000000000a0
[ 1435.522651] #PF: supervisor read access in kernel mode
[ 1435.525407] #PF: error_code(0x0000) - not-present page
[ 1435.528122] PGD 0 P4D 0
[ 1435.529813] Oops: 0000 [#1] SMP PTI
[ 1435.531744] CPU: 0 PID: 121253 Comm: stress-ng-race- Tainted: P O 6.8.0-1008-azure #8-Ubuntu
[ 1435.536481] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
[ 1435.543274] RIP: 0010:pick_next_task_fair+0x91/0x620
[ 1435.545480] Code: 91 00 00 00 49 81 bd b0 02 00 00 80 a8 89 92 75 60 4d 89 fe eb 27 4c 89 f7 e8 0b b7 ff ff 84 c0 75 3f 4c 89 f7 e8 5f 04 ff ff <4c> 8b b0 a0 00 00 00 48 89 c3 4d 85 f6 0f 84 f4 00 00 00 49 8b 46
[ 1435.554629] RSP: 0018:ffffb2b202e73cf8 EFLAGS: 00010096
[ 1435.558030] RAX: 0000000000000000 RBX: ffffb2b202e73dc8 RCX: fd78d84d198c4000
[ 1435.562226] RDX: 0000000000000c00 RSI: e411d03fda1d7382 RDI: 0000000000000c02
[ 1435.566496] RBP: ffffb2b202e73d38 R08: 0000000000000002 R09: 0000000000000002
[ 1435.570327] R10: 0000000000000000 R11: 0000000000000000 R12: ffff920dbbc33580
[ 1435.574620] R13: ffff920d05570000 R14: ffff920dbbc33680 R15: ffff920dbbc33680
[ 1435.579115] FS: 00007fb92ad12d00(0000) GS:ffff920dbbc00000(0000) knlGS:0000000000000000
[ 1435.583308] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1435.586094] CR2: 00000000000000a0 CR3: 0000000102364001 CR4: 00000000003706f0
[ 1435.590178] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1435.594054] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1435.597740] Call Trace:
[ 1435.599469] <TASK>
[ 1435.600605] ? show_regs+0x65/0x70
[ 1435.602396] ? __die+0x24/0x70
[ 1435.603999] ? page_fault_oops+0x99/0x1a0
[ 1435.605856] ? do_user_addr_fault+0x2ae/0x670
[ 1435.607915] ? exc_page_fault+0x7b/0x170
[ 1435.609976] ? asm_exc_page_fault+0x27/0x30
[ 1435.611989] ? pick_next_task_fair+0x91/0x620
[ 1435.614311] ? pick_next_task_fair+0x91/0x620
[ 1435.616811] ? wp_page_copy+0x2f7/0x690
[ 1435.618799] pick_next_task+0x5f/0xcd0
[ 1435.621060] ? do_wp_page+0x1d0/0x430
[ 1435.623596] __schedule+0x169/0x760
[ 1435.625947] ? __cgroup_account_cputime+0x28/0x30
[ 1435.628329] ? update_curr+0x15e/0x1e0
[ 1435.630179] schedule+0x2c/0xf0
[ 1435.633476] do_sched_yield+0x85/0xb0
[ 1435.635452] __do_sys_sched_yield+0xe/0x20
[ 1435.637356] x64_sys_call+0x3d9/0x2030
[ 1435.639400] do_syscall_64+0x7b/0x160
[ 1435.641857] ? handle_mm_fault+0xac/0x3a0
[ 1435.644956] ? irqentry_exit_to_user_mode+0x7b/0x220
[ 1435.647799] ? irqentry_exit+0x1d/0x30
[ 1435.650587] ? exc_page_fault+0x87/0x170
[ 1435.653213] entry_SYSCALL_64_after_hwframe+0x78/0x80
[ 1435.656728] RIP: 0033:0x7fb92ab0e7db
[ 1435.659593] Code: 73 01 c3 48 8b 0d 3d 46 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 18 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d 46 0f 00 f7 d8 64 89 01 48
[ 1435.675388] RSP: 002b:00007fff7ca243d8 EFLAGS: 00000282 ORIG_RAX: 0000000000000018
[ 1435.680830] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb92ab0e7db
[ 1435.686046] RDX: 000055c47ee77db0 RSI: 0000000000000000 RDI: 0000000000000002
[ 1435.690268] RBP: 0000000000000791 R08: 0000000000000002 R09: 011d99605fac8414
[ 1435.694941] R10: 00007fb92ad12fd0 R11: 0000000000000282 R12: 00007fb92acfde18
[ 1435.698607] R13: 0000000000000002 R14: 000000000001d9a5 R15: 0000000000000008
[ 1435.703633] </TASK>
[ 1435.705016] Modules linked in: vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb zfs(PO) spl(O) dccp_ipv4 dccp atm sm3_generic sm3_avx_x86_64 sm3 poly1305_generic poly1305_x86_64 nhpoly1305_avx2 nhpoly1305_sse2 nhpoly1305 libpoly1305 michael_mic md4 streebog_generic rmd160 cmac algif_rng twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic fcrypt cast6_avx_x86_64 cast6_generic cast5_avx_x86_64 cast5_generic cast_common camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 blowfish_generic blowfish_x86_64 blowfish_common algif_skcipher algif_hash aria_aesni_avx2_x86_64 aria_aesni_avx_x86_64 aria_generic sm4_generic sm4_aesni_avx2_x86_64 sm4_aesni_avx_x86_64 sm4 ccm des3_ede_x86_64 des_generic libdes authenc aegis128 aegis128_aesni algif_aead af_alg tls 8021q garp mrp stp llc binfmt_misc nls_iso8859_1 xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_owner xt_tcpudp
[ 1435.705128] nft_compat nf_tables serio_raw joydev dm_multipath msr nvme_fabrics efi_pstore nfnetlink ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 hid_generic hid_hyperv crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 hid pata_acpi hyperv_keyboard hyperv_drm hv_netvsc aesni_intel crypto_simd cryptd
[ 1435.776455] CR2: 00000000000000a0
[ 1435.778976] ---[ end trace 0000000000000000 ]---
[ 1435.782217] RIP: 0010:pick_next_task_fair+0x91/0x620
[ 1435.785040] Code: 91 00 00 00 49 81 bd b0 02 00 00 80 a8 89 92 75 60 4d 89 fe eb 27 4c 89 f7 e8 0b b7 ff ff 84 c0 75 3f 4c 89 f7 e8 5f 04 ff ff <4c> 8b b0 a0 00 00 00 48 89 c3 4d 85 f6 0f 84 f4 00 00 00 49 8b 46
[ 1435.794724] RSP: 0018:ffffb2b202e73cf8 EFLAGS: 00010096
[ 1435.798116] RAX: 0000000000000000 RBX: ffffb2b202e73dc8 RCX: fd78d84d198c4000
[ 1435.802543] RDX: 0000000000000c00 RSI: e411d03fda1d7382 RDI: 0000000000000c02
[ 1435.807466] RBP: ffffb2b202e73d38 R08: 0000000000000002 R09: 0000000000000002
[ 1435.811823] R10: 0000000000000000 R11: 0000000000000000 R12: ffff920dbbc33580
[ 1435.815818] R13: ffff920d05570000 R14: ffff920dbbc33680 R15: ffff920dbbc33680
[ 1435.820778] FS: 00007fb92ad12d00(0000) GS:ffff920dbbc00000(0000) knlGS:0000000000000000
[ 1435.825269] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1435.828468] CR2: 00000000000000a0 CR3: 0000000102364001 CR4: 00000000003706f0
[ 1435.832087] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1435.837461] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1435.841312] note: stress-ng-race-[121253] exited with irqs disabled
I can reproduce this with 6.8.0-1001-azure + latest stress-ng (17bca4c329f8) as well.
Just run "./stress-ng -v -t 5 --race-sched 4 --race-sched-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable" in stress-ng cloned from https://github.com/ColinIanKing/stress-ng (built with make command).
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2068024/+subscriptions
References