← Back to team overview

canonical-ubuntu-qa team mailing list archive

[Bug 2020607] Re: ftracetest from selftests in linux ADT test failure with jammy/linux-intel-iotg (kernel NULL pointer dereference)

 

After splitting ubuntu_kselftests_ftrace out and run test cases one-by-
one, we can see it's failing with the second test case, ftrace:test.d--
00basic--basic2.tc, on J-intel-iotg-5.15.0-1048.54 with node rizzo.

However I was unable to reproduce this manually on rizzo:
  * Passed with running just the ftrace:test.d--00basic--basic2.tc, with "./ftracetest -vvv test.d/00basic/basic2.tc"
  * Passed with running basic2.tc multiple times.
  * Passed with running the 1st test case and the offending basic2.tc test case.
  * Passed with running the whole test suite.

But if you try to run this remotely from out build server:
  SRU_CYCLE="2024.01.08-1" INSTANCE_TYPE="rizzo" timeout 180m $KT/sut-test --nc --region kernel $DEBUG metal $SUT jammy ubuntu_kselftests_ftrace $HOME

It will panic right away when hitting the second test case. It looks like it has something to do with CPU hotplug:
[ 5990.967618] mmiotrace: Disabling non-boot CPUs...
[ 5991.032796] smpboot: CPU 1 is now offline
[ 5991.052877] mmiotrace: CPU1 is down.
[ 5991.124833] smpboot: CPU 2 is now offline
[ 5991.140709] mmiotrace: CPU2 is down.
[ 5991.196717] smpboot: CPU 3 is now offline
[ 5991.216486] mmiotrace: CPU3 is down.
[ 5991.233400] smpboot: CPU 4 is now offline
[ 5991.272507] mmiotrace: CPU4 is down.
[ 5991.313356] smpboot: CPU 5 is now offline
[ 5991.328204] mmiotrace: CPU5 is down.
[ 5991.353591] smpboot: CPU 6 is now offline
[ 5991.376155] mmiotrace: CPU6 is down.
[ 5991.393484] smpboot: CPU 7 is now offline
[ 5991.394580] mmiotrace: CPU7 is down.
[ 5991.394586] mmiotrace: enabled.
[ 5991.394693] mmiotrace: Re-enabling CPUs...
[ 5991.394761] x86: Booting SMP configuration:
[ 5991.394763] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 5991.432595] mmiotrace: enabled CPU1.
[ 5991.479537] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 5991.508524] mmiotrace: enabled CPU2.
[ 5991.547586] smpboot: Booting Node 0 Processor 3 APIC 0x6
[ 5991.576690] mmiotrace: enabled CPU3.
[ 5991.619582] smpboot: Booting Node 0 Processor 4 APIC 0x1
[ 5991.639516] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 5991.646618] #PF: supervisor instruction fetch in kernel mode
[ 5991.652336] #PF: error_code(0x0010) - not-present page
[ 5991.657530] PGD 0 P4D 0 
[ 5991.660096] Oops: 0010 [#1] SMP PTI
[ 5991.663626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-1048-intel-iotg #54-Ubuntu
[ 5991.671709] Hardware name: Dell Inc. PowerEdge R310/05XKKK, BIOS 1.12.0 09/06/2013
[ 5991.679350] RIP: 0010:0x0
[ 5991.682010] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 5991.688955] RSP: 0018:ffffb92a40003e90 EFLAGS: 00010097
[ 5991.694233] RAX: 0000000000000000 RBX: 00000000000231f0 RCX: 0000000000000004
[ 5991.701435] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff9c20c007b990
[ 5991.708639] RBP: ffffb92a40003eb8 R08: ffff9c20c007b990 R09: 0000000000000001
[ 5991.715842] R10: 0000000000000020 R11: ffffffffffffffff R12: ffff9c20c007b990
[ 5991.723050] R13: 00000572e77e8500 R14: 0000000000000004 R15: 0000000000000000
[ 5991.730252] FS:  0000000000000000(0000) GS:ffff9c21f7600000(0000) knlGS:0000000000000000
[ 5991.738417] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5991.744218] CR2: ffffffffffffffd6 CR3: 0000000010c10000 CR4: 00000000000006f0
[ 5991.751425] Call Trace:
[ 5991.753903]  <IRQ>
[ 5991.755945]  ? show_trace_log_lvl+0x1d6/0x2ea
[ 5991.760362]  ? show_trace_log_lvl+0x1d6/0x2ea
[ 5991.764773]  ? tick_do_broadcast+0xa1/0xd0
[ 5991.768922]  ? show_regs.part.0+0x23/0x29
[ 5991.773026]  ? __die_body.cold+0x8/0xd
[ 5991.776821]  ? __die+0x2b/0x37
[ 5991.779918]  ? page_fault_oops+0x13b/0x170
[ 5991.784063]  ? do_user_addr_fault+0x321/0x670
[ 5991.788476]  ? obj_cgroup_uncharge_pages+0x68/0xf0
[ 5991.793324]  ? exc_page_fault+0x77/0x170
[ 5991.797293]  ? asm_exc_page_fault+0x27/0x30
[ 5991.801529]  tick_do_broadcast+0xa1/0xd0
[ 5991.805501]  tick_handle_oneshot_broadcast+0x14d/0x200
[ 5991.810694]  timer_interrupt+0x18/0x30
[ 5991.814495]  __handle_irq_event_percpu+0x42/0x170
[ 5991.819255]  handle_irq_event+0x59/0xb0
[ 5991.823136]  handle_edge_irq+0x8c/0x230
[ 5991.827019]  __common_interrupt+0x52/0xe0
[ 5991.831078]  common_interrupt+0x89/0xa0
[ 5991.834966]  </IRQ>
[ 5991.837098]  <TASK>
[ 5991.839247]  asm_common_interrupt+0x27/0x40

-- 
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2020607

Title:
  ftracetest from selftests in linux ADT test failure with jammy/linux-
  intel-iotg (kernel NULL pointer dereference)

Status in ubuntu-kernel-tests:
  New
Status in linux-intel-iotg package in Ubuntu:
  Invalid
Status in linux-intel-iotg source package in Jammy:
  New

Bug description:
  the failure only is seen on the machine rizzo.

  how to reproduce:
  1. run net selftest in the kernel.
  2. run ftracetest in the kernel, and then there is a highly chance that causes the kernel oops.

  issue could be seen on kernel 5.15.112-0515112 (mainline), 5.15.0-1030-intel-iotg, and 5.15.0-74-generic on the same machine. 
  issue could not be reproduced on kernel 5.19.0-42-generic, 5.17.15-051715 (mainline), and 5.16.20-051620 (mainline).

  [13279.176639] BUG: kernel NULL pointer dereference, address: 0000000000000000
  [13279.183712] #PF: supervisor instruction fetch in kernel mode
  [13279.189446] #PF: error_code(0x0010) - not-present page
  [13279.194654] PGD 0 P4D 0
  [13279.197230] Oops: 0010 [#1] SMP PTI
  [13279.200767] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-74-generic #81-Ubuntu
  [13279.208431] Hardware name: Dell Inc. PowerEdge R310/05XKKK, BIOS 1.12.0 09/06/2013
  [13279.216100] RIP: 0010:0x0
  [13279.218767] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
  [13279.225721] RSP: 0018:ffff9a1f80003e90 EFLAGS: 00010097
  [13279.231013] RAX: 0000000000000000 RBX: 00000000000231f0 RCX: 0000000000000004
  [13279.238229] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff8bfdc0074280
  [13279.245449] RBP: ffff9a1f80003eb8 R08: ffff8bfdc0074280 R09: 0000000000000001
  [13279.252673] R10: 0000000000000020 R11: ffffffffffffffff R12: ffff8bfdc0074280
  [13279.259900] R13: 00000c13a1cab100 R14: 0000000000000004 R15: 0000000000000000
  [13279.267124] FS:  0000000000000000(0000) GS:ffff8bfef7600000(0000) knlGS:0000000000000000
  [13279.275317] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [13279.281140] CR2: ffffffffffffffd6 CR3: 000000004a010000 CR4: 00000000000006f0
  [13279.288366] Call Trace:
  [13279.290851]  <IRQ>
  [13279.292902]  tick_do_broadcast+0xa1/0xd0
  [13279.296894]  tick_handle_oneshot_broadcast+0x14d/0x200
  [13279.302107]  timer_interrupt+0x18/0x30
  [13279.305914]  __handle_irq_event_percpu+0x42/0x170
  [13279.310689]  ? timekeeping_advance+0x32a/0x470
  [13279.315194]  handle_irq_event+0x59/0xb0
  [13279.319086]  handle_edge_irq+0x8c/0x230
  [13279.322976]  __common_interrupt+0x52/0xe0
  [13279.327045]  common_interrupt+0x89/0xa0
  [13279.330941]  </IRQ>
  [13279.333079]  <TASK>
  [13279.335211]  asm_common_interrupt+0x27/0x40
  [13279.339461] RIP: 0010:cpuidle_enter_state+0xd9/0x620
  [13279.344501] Code: 3d e4 e1 d8 4a e8 77 cb 67 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 b8 d8 67 ff 80 7d d0 00 0f 85 61 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6d 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e7 03 00 00
  [13279.363476] RSP: 0018:ffffffffb6603db8 EFLAGS: 00000246
  [13279.368768] RAX: 0000000000000000 RBX: ffff8bfef763b900 RCX: 0000000000000000
  [13279.375984] RDX: ffff8bfdc01863c0 RSI: 0000000000000002 RDI: 0000000000000000
  [13279.383203] RBP: ffffffffb6603e08 R08: 00000c13cc9b2925 R09: 00000000000c3500
  [13279.390423] R10: 0000000000000005 R11: 071c71c71c71c71c R12: ffffffffb68d4b20
  [13279.397646] R13: 0000000000000004 R14: 0000000000000004 R15: 00000c13cc9b2925
  [13279.404883]  ? cpuidle_enter_state+0x24a/0x620
  [13279.409389]  cpuidle_enter+0x2e/0x50
  [13279.413019]  cpuidle_idle_call+0x142/0x1e0
  [13279.417176]  do_idle+0x83/0xf0
  [13279.422843]  cpu_startup_entry+0x20/0x30
  [13279.429360]  rest_init+0xd3/0x100
  [13279.435396]  ? acpi_enable_subsystem+0x21d/0x229
  [13279.442594]  arch_call_rest_init+0xe/0x23
  [13279.449114]  start_kernel+0x4a9/0x4ca
  [13279.455389]  x86_64_start_reservations+0x24/0x2a
  [13279.462619]  x86_64_start_kernel+0xfb/0x106
  [13279.469328]  secondary_startup_64_no_verify+0xc2/0xcb
  [13279.476844]  </TASK>
  [13279.481553] Modules linked in: br_netfilter tls act_mirred cls_matchall ip6_gre gre ip6_tunnel tunnel6 sch_ingress dummy ip6t_rpfilter mpls_gso mpls_iptunnel mpls_router ip_tunnel esp6 esp4 xfrm_user xfrm_algo l2tp_ip6 l2tp_eth l2tp_ip l2tp_netlink l2tp_core 8021q garp mrp ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp sch_etf sch_fq dccp_ipv6 dccp_ipv4 dccp vxlan ip6_udp_tunnel udp_tunnel bridge stp llc vrf nft_counter nft_chain_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink algif_hash af_alg veth ipmi_ssif intel_powerclamp coretemp kvm_intel kvm ipmi_si ipmi_devintf binfmt_misc intel_cstate ipmi_msghandler dcdbas acpi_power_meter i7core_edac mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c
  [13279.481743]  raid1 raid0 multipath linear mgag200 i2c_algo_bit drm_kms_helper gpio_ich syscopyarea sysfillrect sysimgblt fb_sys_fops mpt3sas cec rc_core raid_class drm bnx2 lpc_ich pata_acpi scsi_transport_sas wmi [last unloaded: br_netfilter]
  [13279.619551] CR2: 0000000000000000
  [13279.626035] ---[ end trace f8201db10668ab38 ]---
  [13279.665976] RIP: 0010:0x0
  [13279.671856] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
  [13279.681978] RSP: 0018:ffff9a1f80003e90 EFLAGS: 00010097
  [13279.690380] RAX: 0000000000000000 RBX: 00000000000231f0 RCX: 0000000000000004
  [13279.700720] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff8bfdc0074280
  [13279.711050] RBP: ffff9a1f80003eb8 R08: ffff8bfdc0074280 R09: 0000000000000001
  [13279.721394] R10: 0000000000000020 R11: ffffffffffffffff R12: ffff8bfdc0074280
  [13279.731930] R13: 00000c13a1cab100 R14: 0000000000000004 R15: 0000000000000000
  [13279.742503] FS:  0000000000000000(0000) GS:ffff8bfef7600000(0000) knlGS:0000000000000000
  [13279.753902] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [13279.763030] CR2: ffffffffffffffd6 CR3: 000000004a010000 CR4: 00000000000006f0
  [13279.773701] Kernel panic - not syncing: Fatal exception in interrupt
  [13279.783722] Kernel Offset: 0x33800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
  [13279.819653] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2020607/+subscriptions