← Back to team overview

canonical-ubuntu-qa team mailing list archive

[Bug 2068024] Re: race_sched in ubuntu_stress_smoke_test will cause kernel panic on 6.8 with Azure Standard_A2_v2 instance

 

This bug is awaiting verification that the linux-
nvidia-6.8/6.8.0-1009.9~22.04.1 kernel in -proposed solves the problem.
Please test the kernel and update this bug with the results. If the
problem is solved, change the tag 'verification-needed-jammy-linux-
nvidia-6.8' to 'verification-done-jammy-linux-nvidia-6.8'. If the
problem still exists, change the tag 'verification-needed-jammy-linux-
nvidia-6.8' to 'verification-failed-jammy-linux-nvidia-6.8'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-jammy-linux-nvidia-6.8-v2 verification-needed-jammy-linux-nvidia-6.8

-- 
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2068024

Title:
  race_sched in ubuntu_stress_smoke_test will cause kernel panic on 6.8
  with Azure Standard_A2_v2 instance

Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Noble:
  Fix Released

Bug description:
  This issue can be found on:
    * N-Azure-6.8.0-1008.8
    * N-geneirc-6.8.0-35.35
    * J-Azure-6.8.0-1008.8~22.04.1

  With 100% reproduced rate on Azure Standard_A2_v2 instance, (reproduce
  rate 100%), it can be found on Standard_D2pds_v5 as well, but with a
  lower reproduce rate.

  syslog output:
  2024-06-04T12:21:29.655736+00:00 n-laz-az-6-8-stda2v2-u-stress-smk-test kernel: zswap: loaded using pool lzo/zbud
  2024-06-04T12:21:29.727437+00:00 n-laz-az-6-8-stda2v2-u-stress-smk-test stress-ng: invoked with './stress-ng -v -t 5 --race-sched 4 --race-sched-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable' by user 0 'root'
  2024-06-04T12:21:29.727600+00:00 n-laz-az-6-8-stda2v2-u-stress-smk-test stress-ng: system: 'n-laz-az-6-8-stda2v2-u-stress-smk-test' Linux 6.8.0-1001-azure #1-Ubuntu SMP Tue Feb 13 17:53:47 UTC 2024 x86_64
  2024-06-04T12:21:29.727683+00:00 n-laz-az-6-8-stda2v2-u-stress-smk-test stress-ng: memory (MB): total 3918.72, free 3424.57, shared 4.08, buffer 36.20, swap 0.00, free swap 0.00
  2024-06-04T12:21:29.727723+00:00 n-laz-az-6-8-stda2v2-u-stress-smk-test stress-ng: stress-ng: info:  [1250] setting to a 5 secs run per stressor
  2024-06-04T12:21:29.805799+00:00 n-laz-az-6-8-stda2v2-u-stress-smk-test stress-ng: stress-ng: info:  [1250] dispatching hogs: 4 race-sched

  Console output:
  [ 1167.163045] I/O error, dev loop0, sector 256 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
  [ 1435.517597] BUG: kernel NULL pointer dereference, address: 00000000000000a0
  [ 1435.522651] #PF: supervisor read access in kernel mode
  [ 1435.525407] #PF: error_code(0x0000) - not-present page
  [ 1435.528122] PGD 0 P4D 0
  [ 1435.529813] Oops: 0000 [#1] SMP PTI
  [ 1435.531744] CPU: 0 PID: 121253 Comm: stress-ng-race- Tainted: P           O       6.8.0-1008-azure #8-Ubuntu
  [ 1435.536481] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
  [ 1435.543274] RIP: 0010:pick_next_task_fair+0x91/0x620
  [ 1435.545480] Code: 91 00 00 00 49 81 bd b0 02 00 00 80 a8 89 92 75 60 4d 89 fe eb 27 4c 89 f7 e8 0b b7 ff ff 84 c0 75 3f 4c 89 f7 e8 5f 04 ff ff <4c> 8b b0 a0 00 00 00 48 89 c3 4d 85 f6 0f 84 f4 00 00 00 49 8b 46
  [ 1435.554629] RSP: 0018:ffffb2b202e73cf8 EFLAGS: 00010096
  [ 1435.558030] RAX: 0000000000000000 RBX: ffffb2b202e73dc8 RCX: fd78d84d198c4000
  [ 1435.562226] RDX: 0000000000000c00 RSI: e411d03fda1d7382 RDI: 0000000000000c02
  [ 1435.566496] RBP: ffffb2b202e73d38 R08: 0000000000000002 R09: 0000000000000002
  [ 1435.570327] R10: 0000000000000000 R11: 0000000000000000 R12: ffff920dbbc33580
  [ 1435.574620] R13: ffff920d05570000 R14: ffff920dbbc33680 R15: ffff920dbbc33680
  [ 1435.579115] FS:  00007fb92ad12d00(0000) GS:ffff920dbbc00000(0000) knlGS:0000000000000000
  [ 1435.583308] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1435.586094] CR2: 00000000000000a0 CR3: 0000000102364001 CR4: 00000000003706f0
  [ 1435.590178] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [ 1435.594054] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [ 1435.597740] Call Trace:
  [ 1435.599469]  <TASK>
  [ 1435.600605]  ? show_regs+0x65/0x70
  [ 1435.602396]  ? __die+0x24/0x70
  [ 1435.603999]  ? page_fault_oops+0x99/0x1a0
  [ 1435.605856]  ? do_user_addr_fault+0x2ae/0x670
  [ 1435.607915]  ? exc_page_fault+0x7b/0x170
  [ 1435.609976]  ? asm_exc_page_fault+0x27/0x30
  [ 1435.611989]  ? pick_next_task_fair+0x91/0x620
  [ 1435.614311]  ? pick_next_task_fair+0x91/0x620
  [ 1435.616811]  ? wp_page_copy+0x2f7/0x690
  [ 1435.618799]  pick_next_task+0x5f/0xcd0
  [ 1435.621060]  ? do_wp_page+0x1d0/0x430
  [ 1435.623596]  __schedule+0x169/0x760
  [ 1435.625947]  ? __cgroup_account_cputime+0x28/0x30
  [ 1435.628329]  ? update_curr+0x15e/0x1e0
  [ 1435.630179]  schedule+0x2c/0xf0
  [ 1435.633476]  do_sched_yield+0x85/0xb0
  [ 1435.635452]  __do_sys_sched_yield+0xe/0x20
  [ 1435.637356]  x64_sys_call+0x3d9/0x2030
  [ 1435.639400]  do_syscall_64+0x7b/0x160
  [ 1435.641857]  ? handle_mm_fault+0xac/0x3a0
  [ 1435.644956]  ? irqentry_exit_to_user_mode+0x7b/0x220
  [ 1435.647799]  ? irqentry_exit+0x1d/0x30
  [ 1435.650587]  ? exc_page_fault+0x87/0x170
  [ 1435.653213]  entry_SYSCALL_64_after_hwframe+0x78/0x80
  [ 1435.656728] RIP: 0033:0x7fb92ab0e7db
  [ 1435.659593] Code: 73 01 c3 48 8b 0d 3d 46 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 18 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d 46 0f 00 f7 d8 64 89 01 48
  [ 1435.675388] RSP: 002b:00007fff7ca243d8 EFLAGS: 00000282 ORIG_RAX: 0000000000000018
  [ 1435.680830] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb92ab0e7db
  [ 1435.686046] RDX: 000055c47ee77db0 RSI: 0000000000000000 RDI: 0000000000000002
  [ 1435.690268] RBP: 0000000000000791 R08: 0000000000000002 R09: 011d99605fac8414
  [ 1435.694941] R10: 00007fb92ad12fd0 R11: 0000000000000282 R12: 00007fb92acfde18
  [ 1435.698607] R13: 0000000000000002 R14: 000000000001d9a5 R15: 0000000000000008
  [ 1435.703633]  </TASK>
  [ 1435.705016] Modules linked in: vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb zfs(PO) spl(O) dccp_ipv4 dccp atm sm3_generic sm3_avx_x86_64 sm3 poly1305_generic poly1305_x86_64 nhpoly1305_avx2 nhpoly1305_sse2 nhpoly1305 libpoly1305 michael_mic md4 streebog_generic rmd160 cmac algif_rng twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic fcrypt cast6_avx_x86_64 cast6_generic cast5_avx_x86_64 cast5_generic cast_common camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 blowfish_generic blowfish_x86_64 blowfish_common algif_skcipher algif_hash aria_aesni_avx2_x86_64 aria_aesni_avx_x86_64 aria_generic sm4_generic sm4_aesni_avx2_x86_64 sm4_aesni_avx_x86_64 sm4 ccm des3_ede_x86_64 des_generic libdes authenc aegis128 aegis128_aesni algif_aead af_alg tls 8021q garp mrp stp llc binfmt_misc nls_iso8859_1 xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_owner xt_tcpudp
  [ 1435.705128]  nft_compat nf_tables serio_raw joydev dm_multipath msr nvme_fabrics efi_pstore nfnetlink ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 hid_generic hid_hyperv crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 hid pata_acpi hyperv_keyboard hyperv_drm hv_netvsc aesni_intel crypto_simd cryptd
  [ 1435.776455] CR2: 00000000000000a0
  [ 1435.778976] ---[ end trace 0000000000000000 ]---
  [ 1435.782217] RIP: 0010:pick_next_task_fair+0x91/0x620
  [ 1435.785040] Code: 91 00 00 00 49 81 bd b0 02 00 00 80 a8 89 92 75 60 4d 89 fe eb 27 4c 89 f7 e8 0b b7 ff ff 84 c0 75 3f 4c 89 f7 e8 5f 04 ff ff <4c> 8b b0 a0 00 00 00 48 89 c3 4d 85 f6 0f 84 f4 00 00 00 49 8b 46
  [ 1435.794724] RSP: 0018:ffffb2b202e73cf8 EFLAGS: 00010096
  [ 1435.798116] RAX: 0000000000000000 RBX: ffffb2b202e73dc8 RCX: fd78d84d198c4000
  [ 1435.802543] RDX: 0000000000000c00 RSI: e411d03fda1d7382 RDI: 0000000000000c02
  [ 1435.807466] RBP: ffffb2b202e73d38 R08: 0000000000000002 R09: 0000000000000002
  [ 1435.811823] R10: 0000000000000000 R11: 0000000000000000 R12: ffff920dbbc33580
  [ 1435.815818] R13: ffff920d05570000 R14: ffff920dbbc33680 R15: ffff920dbbc33680
  [ 1435.820778] FS:  00007fb92ad12d00(0000) GS:ffff920dbbc00000(0000) knlGS:0000000000000000
  [ 1435.825269] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1435.828468] CR2: 00000000000000a0 CR3: 0000000102364001 CR4: 00000000003706f0
  [ 1435.832087] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [ 1435.837461] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [ 1435.841312] note: stress-ng-race-[121253] exited with irqs disabled

  I can reproduce this with 6.8.0-1001-azure + latest stress-ng (17bca4c329f8) as well.
  Just run "./stress-ng -v -t 5 --race-sched 4 --race-sched-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable" in stress-ng cloned from https://github.com/ColinIanKing/stress-ng (built with make command).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2068024/+subscriptions



References