← Back to team overview

canonical-ubuntu-qa team mailing list archive

[Bug 2069147] Re: read_all_sys test in ubuntu_ltp triggers "BUG: kernel NULL pointer dereference" with 5.15.0-1058-nvidia on node hidon

 

*** This bug is a duplicate of bug 2069081 ***
    https://bugs.launchpad.net/bugs/2069081

** This bug has been marked a duplicate of bug 2069081
   idxd: NULL pointer dereference reading wq op_config attribute

-- 
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2069147

Title:
  read_all_sys test in ubuntu_ltp triggers "BUG: kernel NULL pointer
  dereference" with 5.15.0-1058-nvidia on node hidon

Status in ubuntu-kernel-tests:
  New

Bug description:
  Issue found with Jammy 5.15.0-1058-nvidia on node hidon (DGXH100),
  other NVIDIA nodes are good.

  Steps:
  sudo apt install -y automake bison build-essential byacc flex git keyutils libacl1-dev libaio-dev libcap-dev libmm-dev libnuma-dev libsctp-dev libselinux1-dev libssl-dev libtirpc-dev pkg-config quota xfslibs-dev xfsprogs
  git clone https://github.com/linux-test-project/ltp.git
  cd ltp
  git reset HEAD 998df1a5aa5026c5c9b91b0caa3b1188146aa678 --hard
  make autotools
  ./configure
  make
  sudo make install
  # Start watching demsg output here
  sudo /opt/ltp/testcases/bin/read_all -d /sys/devices/pci0000\:e7/0000\:e7\:01.0/dsa2/

  dmesg output:
  [  206.893706] BUG: kernel NULL pointer dereference, address: 0000000000000018
  [  206.901552] #PF: supervisor read access in kernel mode
  [  206.907341] #PF: error_code(0x0000) - not-present page
  [  206.913128] PGD 1660ef067 P4D 0
  [  206.916775] Oops: 0000 [#1] SMP NOPTI
  [  206.920909] CPU: 31 PID: 4238 Comm: read_all Tainted: G           OE     5.15.0-1058-nvidia #59-Ubuntu
  [  206.931379] Hardware name: NVIDIA DGXH100/DGXH100, BIOS 1.1.3 10/30/2023
  [  206.938925] RIP: 0010:op_cap_show_common+0x33/0x110 [idxd]
  [  206.945114] Code: 41 57 49 89 f7 41 56 4c 8d 72 10 41 55 49 89 d5 41 54 53 31 db 48 83 ec 18 48 89 7d c0 65 48 8b 04 25 28 00 00 00 48 89 45 d0 <48> 8b 42 18 48 89 45 c8 89 de 4c 8d 45 c8 b9 40 00 00 00 4c 89 ff
  [  206.966209] RSP: 0018:ff3645a473cbfc50 EFLAGS: 00010286
  [  206.972099] RAX: bf9049f9dfedf700 RBX: 0000000000000000 RCX: 0000000000000000
  [  206.980130] RDX: 0000000000000000 RSI: ff3332a81111e000 RDI: ff3333a809ae6040
  [  206.988158] RBP: ff3645a473cbfc90 R08: ff3333a809ae6040 R09: ff3332a81111e000
  [  206.996183] R10: 000000000000000b R11: 0000000000000000 R12: ffffffffa3cd1fe0
  [  207.004215] R13: 0000000000000000 R14: 0000000000000010 R15: ff3332a81111e000
  [  207.012248] FS:  00007fb56afe3740(0000) GS:ff3333a37ebc0000(0000) knlGS:0000000000000000
  [  207.021356] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  207.027826] CR2: 0000000000000018 CR3: 000000011086a006 CR4: 0000000000771ee0
  [  207.035858] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [  207.043890] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
  [  207.051921] PKRU: 55555554
  [  207.054978] Call Trace:
  [  207.057748]  <TASK>
  [  207.060125]  ? show_trace_log_lvl+0x1d6/0x2ea
  [  207.065043]  ? show_trace_log_lvl+0x1d6/0x2ea
  [  207.069952]  ? wq_op_config_show+0x18/0x20 [idxd]
  [  207.075253]  ? show_regs.part.0+0x23/0x29
  [  207.079774]  ? __die_body.cold+0x8/0xd
  [  207.084004]  ? __die+0x2b/0x37
  [  207.087454]  ? page_fault_oops+0x13b/0x170
  [  207.092085]  ? memcg_slab_post_alloc_hook+0x19e/0x210
  [  207.097783]  ? __wake_up+0x13/0x20
  [  207.101632]  ? do_user_addr_fault+0x321/0x670
  [  207.106543]  ? exc_page_fault+0x77/0x170
  [  207.110977]  ? asm_exc_page_fault+0x27/0x30
  [  207.115700]  ? op_cap_show_common+0x33/0x110 [idxd]
  [  207.121197]  wq_op_config_show+0x18/0x20 [idxd]
  [  207.126303]  dev_attr_show+0x1a/0x50
  [  207.130333]  sysfs_kf_seq_show+0xa2/0x100
  [  207.134857]  kernfs_seq_show+0x24/0x30
  [  207.139087]  seq_read_iter+0x121/0x4b0
  [  207.143321]  kernfs_fop_read_iter+0x30/0x40
  [  207.148035]  new_sync_read+0x10a/0x190
  [  207.152269]  vfs_read+0x103/0x1a0
  [  207.156008]  ksys_read+0x67/0xf0
  [  207.159653]  __x64_sys_read+0x19/0x20
  [  207.163781]  x64_sys_call+0x1dba/0x1fa0
  [  207.168110]  do_syscall_64+0x56/0xb0
  [  207.172142]  ? exit_to_user_mode_prepare+0x37/0xb0
  [  207.177545]  ? syscall_exit_to_user_mode+0x35/0x50
  [  207.182941]  ? x64_sys_call+0x1dba/0x1fa0
  [  207.187449]  ? do_syscall_64+0x63/0xb0
  [  207.191676]  ? exit_to_user_mode_prepare+0x96/0xb0
  [  207.197077]  ? syscall_exit_to_user_mode+0x35/0x50
  [  207.202477]  ? x64_sys_call+0x1a55/0x1fa0
  [  207.206998]  ? do_syscall_64+0x63/0xb0
  [  207.211226]  ? do_syscall_64+0x63/0xb0
  [  207.215455]  entry_SYSCALL_64_after_hwframe+0x67/0xd1
  [  207.221142] RIP: 0033:0x7fb56b0fa7e2
  [  207.225176] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 8a b4 0c 00 e8 a5 1d 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
  [  207.246269] RSP: 002b:00007ffdb852f498 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
  [  207.254787] RAX: ffffffffffffffda RBX: 00007fb56afab028 RCX: 00007fb56b0fa7e2
  [  207.262819] RDX: 00000000000003ff RSI: 00007ffdb852f570 RDI: 0000000000000003
  [  207.270850] RBP: 0000563101392168 R08: 0000000000000000 R09: 00007ffdb852ec30
  [  207.278881] R10: 00007ffdb85ca170 R11: 0000000000000246 R12: 000056310137f012
  [  207.286912] R13: 000056310137f06f R14: 000056310222bab0 R15: 00007fb56afa7000
  [  207.294944]  </TASK>
  [  207.297412] Modules linked in: nvidia_uvm(O) nvidia_drm(O) nvidia_modeset(O) intel_rapl_msr intel_rapl_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm binfmt_misc nls_iso8859_1 ipmi_ssif rapl mlx5_ib(OE) isst_if_mbox_pci nvidia(O) intel_th_gth qat_4xxx ib_uverbs(OE) joydev input_leds intel_qat idxd pmt_crashlog pmt_telemetry isst_if_mmio mei_me intel_th_pci pmt_class isst_if_common authenc idxd_bus ib_core(OE) mei intel_th switchtec acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_lenovo hid_generic usbhid hid mlx5_core(OE) ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea crct10dif_pclmul mlxdevm(OE) crc32_pclmul sysfillrect ghash_clmulni_intel
  [  207.297467]  sha256_ssse3 mlxfw(OE) sha1_ssse3 sysimgblt psample fb_sys_fops aesni_intel cec tls ixgbe crypto_simd nvme xhci_pci cryptd i2c_i801 rc_core mlx_compat(OE) xfrm_algo dca intel_pmt i2c_ismt i2c_smbus pci_hyperv_intf drm xhci_pci_renesas mdio nvme_core wmi pinctrl_emmitsburg
  [  207.423040] CR2: 0000000000000018
  [  207.426783] ---[ end trace 7e35f51fec2ac5d9 ]---

  5.15.0-1054-nvidia Looks ok on this system.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2069147/+subscriptions



References