canonical-ubuntu-qa team mailing list archive
-
canonical-ubuntu-qa team
-
Mailing list archive
-
Message #02717
[Bug 2020607] Re: ftracetest from selftests in linux ADT test failure with jammy/linux-intel-iotg (kernel NULL pointer dereference)
After splitting ubuntu_kselftests_ftrace out and run test cases one-by-
one, we can see it's failing with the second test case, ftrace:test.d--
00basic--basic2.tc, on J-intel-iotg-5.15.0-1048.54 with node rizzo.
However I was unable to reproduce this manually on rizzo:
* Passed with running just the ftrace:test.d--00basic--basic2.tc, with "./ftracetest -vvv test.d/00basic/basic2.tc"
* Passed with running basic2.tc multiple times.
* Passed with running the 1st test case and the offending basic2.tc test case.
* Passed with running the whole test suite.
But if you try to run this remotely from out build server:
SRU_CYCLE="2024.01.08-1" INSTANCE_TYPE="rizzo" timeout 180m $KT/sut-test --nc --region kernel $DEBUG metal $SUT jammy ubuntu_kselftests_ftrace $HOME
It will panic right away when hitting the second test case. It looks like it has something to do with CPU hotplug:
[ 5990.967618] mmiotrace: Disabling non-boot CPUs...
[ 5991.032796] smpboot: CPU 1 is now offline
[ 5991.052877] mmiotrace: CPU1 is down.
[ 5991.124833] smpboot: CPU 2 is now offline
[ 5991.140709] mmiotrace: CPU2 is down.
[ 5991.196717] smpboot: CPU 3 is now offline
[ 5991.216486] mmiotrace: CPU3 is down.
[ 5991.233400] smpboot: CPU 4 is now offline
[ 5991.272507] mmiotrace: CPU4 is down.
[ 5991.313356] smpboot: CPU 5 is now offline
[ 5991.328204] mmiotrace: CPU5 is down.
[ 5991.353591] smpboot: CPU 6 is now offline
[ 5991.376155] mmiotrace: CPU6 is down.
[ 5991.393484] smpboot: CPU 7 is now offline
[ 5991.394580] mmiotrace: CPU7 is down.
[ 5991.394586] mmiotrace: enabled.
[ 5991.394693] mmiotrace: Re-enabling CPUs...
[ 5991.394761] x86: Booting SMP configuration:
[ 5991.394763] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 5991.432595] mmiotrace: enabled CPU1.
[ 5991.479537] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 5991.508524] mmiotrace: enabled CPU2.
[ 5991.547586] smpboot: Booting Node 0 Processor 3 APIC 0x6
[ 5991.576690] mmiotrace: enabled CPU3.
[ 5991.619582] smpboot: Booting Node 0 Processor 4 APIC 0x1
[ 5991.639516] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 5991.646618] #PF: supervisor instruction fetch in kernel mode
[ 5991.652336] #PF: error_code(0x0010) - not-present page
[ 5991.657530] PGD 0 P4D 0
[ 5991.660096] Oops: 0010 [#1] SMP PTI
[ 5991.663626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-1048-intel-iotg #54-Ubuntu
[ 5991.671709] Hardware name: Dell Inc. PowerEdge R310/05XKKK, BIOS 1.12.0 09/06/2013
[ 5991.679350] RIP: 0010:0x0
[ 5991.682010] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 5991.688955] RSP: 0018:ffffb92a40003e90 EFLAGS: 00010097
[ 5991.694233] RAX: 0000000000000000 RBX: 00000000000231f0 RCX: 0000000000000004
[ 5991.701435] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff9c20c007b990
[ 5991.708639] RBP: ffffb92a40003eb8 R08: ffff9c20c007b990 R09: 0000000000000001
[ 5991.715842] R10: 0000000000000020 R11: ffffffffffffffff R12: ffff9c20c007b990
[ 5991.723050] R13: 00000572e77e8500 R14: 0000000000000004 R15: 0000000000000000
[ 5991.730252] FS: 0000000000000000(0000) GS:ffff9c21f7600000(0000) knlGS:0000000000000000
[ 5991.738417] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5991.744218] CR2: ffffffffffffffd6 CR3: 0000000010c10000 CR4: 00000000000006f0
[ 5991.751425] Call Trace:
[ 5991.753903] <IRQ>
[ 5991.755945] ? show_trace_log_lvl+0x1d6/0x2ea
[ 5991.760362] ? show_trace_log_lvl+0x1d6/0x2ea
[ 5991.764773] ? tick_do_broadcast+0xa1/0xd0
[ 5991.768922] ? show_regs.part.0+0x23/0x29
[ 5991.773026] ? __die_body.cold+0x8/0xd
[ 5991.776821] ? __die+0x2b/0x37
[ 5991.779918] ? page_fault_oops+0x13b/0x170
[ 5991.784063] ? do_user_addr_fault+0x321/0x670
[ 5991.788476] ? obj_cgroup_uncharge_pages+0x68/0xf0
[ 5991.793324] ? exc_page_fault+0x77/0x170
[ 5991.797293] ? asm_exc_page_fault+0x27/0x30
[ 5991.801529] tick_do_broadcast+0xa1/0xd0
[ 5991.805501] tick_handle_oneshot_broadcast+0x14d/0x200
[ 5991.810694] timer_interrupt+0x18/0x30
[ 5991.814495] __handle_irq_event_percpu+0x42/0x170
[ 5991.819255] handle_irq_event+0x59/0xb0
[ 5991.823136] handle_edge_irq+0x8c/0x230
[ 5991.827019] __common_interrupt+0x52/0xe0
[ 5991.831078] common_interrupt+0x89/0xa0
[ 5991.834966] </IRQ>
[ 5991.837098] <TASK>
[ 5991.839247] asm_common_interrupt+0x27/0x40
--
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2020607
Title:
ftracetest from selftests in linux ADT test failure with jammy/linux-
intel-iotg (kernel NULL pointer dereference)
Status in ubuntu-kernel-tests:
New
Status in linux-intel-iotg package in Ubuntu:
Invalid
Status in linux-intel-iotg source package in Jammy:
New
Bug description:
the failure only is seen on the machine rizzo.
how to reproduce:
1. run net selftest in the kernel.
2. run ftracetest in the kernel, and then there is a highly chance that causes the kernel oops.
issue could be seen on kernel 5.15.112-0515112 (mainline), 5.15.0-1030-intel-iotg, and 5.15.0-74-generic on the same machine.
issue could not be reproduced on kernel 5.19.0-42-generic, 5.17.15-051715 (mainline), and 5.16.20-051620 (mainline).
[13279.176639] BUG: kernel NULL pointer dereference, address: 0000000000000000
[13279.183712] #PF: supervisor instruction fetch in kernel mode
[13279.189446] #PF: error_code(0x0010) - not-present page
[13279.194654] PGD 0 P4D 0
[13279.197230] Oops: 0010 [#1] SMP PTI
[13279.200767] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-74-generic #81-Ubuntu
[13279.208431] Hardware name: Dell Inc. PowerEdge R310/05XKKK, BIOS 1.12.0 09/06/2013
[13279.216100] RIP: 0010:0x0
[13279.218767] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[13279.225721] RSP: 0018:ffff9a1f80003e90 EFLAGS: 00010097
[13279.231013] RAX: 0000000000000000 RBX: 00000000000231f0 RCX: 0000000000000004
[13279.238229] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff8bfdc0074280
[13279.245449] RBP: ffff9a1f80003eb8 R08: ffff8bfdc0074280 R09: 0000000000000001
[13279.252673] R10: 0000000000000020 R11: ffffffffffffffff R12: ffff8bfdc0074280
[13279.259900] R13: 00000c13a1cab100 R14: 0000000000000004 R15: 0000000000000000
[13279.267124] FS: 0000000000000000(0000) GS:ffff8bfef7600000(0000) knlGS:0000000000000000
[13279.275317] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13279.281140] CR2: ffffffffffffffd6 CR3: 000000004a010000 CR4: 00000000000006f0
[13279.288366] Call Trace:
[13279.290851] <IRQ>
[13279.292902] tick_do_broadcast+0xa1/0xd0
[13279.296894] tick_handle_oneshot_broadcast+0x14d/0x200
[13279.302107] timer_interrupt+0x18/0x30
[13279.305914] __handle_irq_event_percpu+0x42/0x170
[13279.310689] ? timekeeping_advance+0x32a/0x470
[13279.315194] handle_irq_event+0x59/0xb0
[13279.319086] handle_edge_irq+0x8c/0x230
[13279.322976] __common_interrupt+0x52/0xe0
[13279.327045] common_interrupt+0x89/0xa0
[13279.330941] </IRQ>
[13279.333079] <TASK>
[13279.335211] asm_common_interrupt+0x27/0x40
[13279.339461] RIP: 0010:cpuidle_enter_state+0xd9/0x620
[13279.344501] Code: 3d e4 e1 d8 4a e8 77 cb 67 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 b8 d8 67 ff 80 7d d0 00 0f 85 61 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6d 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e7 03 00 00
[13279.363476] RSP: 0018:ffffffffb6603db8 EFLAGS: 00000246
[13279.368768] RAX: 0000000000000000 RBX: ffff8bfef763b900 RCX: 0000000000000000
[13279.375984] RDX: ffff8bfdc01863c0 RSI: 0000000000000002 RDI: 0000000000000000
[13279.383203] RBP: ffffffffb6603e08 R08: 00000c13cc9b2925 R09: 00000000000c3500
[13279.390423] R10: 0000000000000005 R11: 071c71c71c71c71c R12: ffffffffb68d4b20
[13279.397646] R13: 0000000000000004 R14: 0000000000000004 R15: 00000c13cc9b2925
[13279.404883] ? cpuidle_enter_state+0x24a/0x620
[13279.409389] cpuidle_enter+0x2e/0x50
[13279.413019] cpuidle_idle_call+0x142/0x1e0
[13279.417176] do_idle+0x83/0xf0
[13279.422843] cpu_startup_entry+0x20/0x30
[13279.429360] rest_init+0xd3/0x100
[13279.435396] ? acpi_enable_subsystem+0x21d/0x229
[13279.442594] arch_call_rest_init+0xe/0x23
[13279.449114] start_kernel+0x4a9/0x4ca
[13279.455389] x86_64_start_reservations+0x24/0x2a
[13279.462619] x86_64_start_kernel+0xfb/0x106
[13279.469328] secondary_startup_64_no_verify+0xc2/0xcb
[13279.476844] </TASK>
[13279.481553] Modules linked in: br_netfilter tls act_mirred cls_matchall ip6_gre gre ip6_tunnel tunnel6 sch_ingress dummy ip6t_rpfilter mpls_gso mpls_iptunnel mpls_router ip_tunnel esp6 esp4 xfrm_user xfrm_algo l2tp_ip6 l2tp_eth l2tp_ip l2tp_netlink l2tp_core 8021q garp mrp ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp sch_etf sch_fq dccp_ipv6 dccp_ipv4 dccp vxlan ip6_udp_tunnel udp_tunnel bridge stp llc vrf nft_counter nft_chain_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink algif_hash af_alg veth ipmi_ssif intel_powerclamp coretemp kvm_intel kvm ipmi_si ipmi_devintf binfmt_misc intel_cstate ipmi_msghandler dcdbas acpi_power_meter i7core_edac mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c
[13279.481743] raid1 raid0 multipath linear mgag200 i2c_algo_bit drm_kms_helper gpio_ich syscopyarea sysfillrect sysimgblt fb_sys_fops mpt3sas cec rc_core raid_class drm bnx2 lpc_ich pata_acpi scsi_transport_sas wmi [last unloaded: br_netfilter]
[13279.619551] CR2: 0000000000000000
[13279.626035] ---[ end trace f8201db10668ab38 ]---
[13279.665976] RIP: 0010:0x0
[13279.671856] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[13279.681978] RSP: 0018:ffff9a1f80003e90 EFLAGS: 00010097
[13279.690380] RAX: 0000000000000000 RBX: 00000000000231f0 RCX: 0000000000000004
[13279.700720] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff8bfdc0074280
[13279.711050] RBP: ffff9a1f80003eb8 R08: ffff8bfdc0074280 R09: 0000000000000001
[13279.721394] R10: 0000000000000020 R11: ffffffffffffffff R12: ffff8bfdc0074280
[13279.731930] R13: 00000c13a1cab100 R14: 0000000000000004 R15: 0000000000000000
[13279.742503] FS: 0000000000000000(0000) GS:ffff8bfef7600000(0000) knlGS:0000000000000000
[13279.753902] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13279.763030] CR2: ffffffffffffffd6 CR3: 000000004a010000 CR4: 00000000000006f0
[13279.773701] Kernel panic - not syncing: Fatal exception in interrupt
[13279.783722] Kernel Offset: 0x33800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[13279.819653] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2020607/+subscriptions