canonical-ubuntu-qa team mailing list archive
-
canonical-ubuntu-qa team
-
Mailing list archive
-
Message #06466
[Bug 2106022] [NEW] log_check/kernel_tainted failed with kernel warnings at kernel/time/timer_migration.c:543 on Oracular
Public bug reported:
Found during boot testing of Noble linux-lowlatency-hwe-6.11
(6.11.0-1012.13~24.04.1) on TF amd-server.
Sample kernel warning message:
WARNING: CPU: 0 PID: 1 at kernel/time/timer_migration.c:543 tmigr_requires_handle_remote+0x123/0x130
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-1012-lowlatency #13~24.04.1-Ubuntu
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/07/2018
RIP: 0010:tmigr_requires_handle_remote+0x123/0x130
Code: 65 48 2b 14 25 28 00 00 00 75 23 48 83 c4 10 5b 41 5c 41 5d 41 5e 41
5f 5d 31 d2 31 c9 31 f6 31 ff e9 c1 84 07 01 0f 0b eb ba <0f> 0b eb a9 e8 44
5d 06 01 0f 1f 40 00 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffa6f9c0003f30 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff8c899f026200 RCX: 7fffffffffffffff
RDX: ffff8c8240100e00 RSI: 0000000000000002 RDI: 0000000000000000
RBP: ffffa6f9c0003f68 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff8c899f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8cc1bfdff000 CR3: 0000002dcb83e000 CR4: 00000000003506f0
Call Trace:
? show_regs+0x6c/0x80
? __warn+0x88/0x140
? tmigr_requires_handle_remote+0x123/0x130
? report_bug+0x182/0x1b0
? handle_bug+0x6e/0xb0
? exc_invalid_op+0x18/0x80
? asm_exc_invalid_op+0x1b/0x20
? tmigr_requires_handle_remote+0x123/0x130
update_process_times+0x63/0xb0
tick_periodic+0x2d/0x90
tick_handle_periodic+0x25/0x80
__sysvec_apic_timer_interrupt+0x59/0x130
sysvec_apic_timer_interrupt+0x9b/0xc0
asm_sysvec_apic_timer_interrupt+0x1b/0x20
RIP: 0010:delay_halt_mwaitx+0x3c/0x50
Code: 05 91 3f 60 64 48 05 00 60 00 00 0f 01 fa b8 ff ff ff ff b9 02 00 00
00 48 39 c6 48 0f 46 c6 48 89 c3 b8 f0 00 00 00 0f 01 fb <48> 8b 5d f8 c9 31
c0 31 d2 31 c9 31 f6 e9 22 53 09 00 66 90 90 90
RSP: 0018:ffffa6f9c007bbf8 EFLAGS: 00000293
RAX: 00000000000000f0 RBX: 0000000000005d93 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 0000000000005d93 RDI: 00000035e3527498
RBP: ffffa6f9c007bc00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000005d93
R13: 0000000000000005 R14: 0000000000000001 R15: 0000000000000020
? srso_return_thunk+0x5/0x5f
delay_halt.part.0+0x3e/0x70
delay_halt+0x13/0x30
__const_udelay+0x3d/0x50
wakeup_secondary_cpu_via_init+0xed/0x2e0
do_boot_cpu+0x1d1/0x200
native_kick_ap+0x111/0x1d0
arch_cpuhp_kick_ap_alive+0x15/0x20
cpuhp_kick_ap_alive+0x55/0x90
? __pfx_cpuhp_kick_ap_alive+0x10/0x10
cpuhp_invoke_callback+0x340/0x520
__cpuhp_invoke_callback_range+0x80/0x100
_cpu_up+0x10b/0x280
cpu_up+0xe3/0x120
cpuhp_bringup_mask+0x71/0xd0
cpuhp_bringup_cpus_parallel+0x116/0x150
? __pfx_kernel_init+0x10/0x10
bringup_nonboot_cpus+0x22/0x50
smp_init+0x2a/0x90
kernel_init_freeable+0x10b/0x210
kernel_init+0x1b/0x200
ret_from_fork+0x47/0x70
? __pfx_kernel_init+0x10/0x10
ret_from_fork_asm+0x1a/0x30
---[ end trace 0000000000000000 ]---
This issue can be reproduced with oracular/linux, at least with the same
tmigr_group hierarchy, so it is likely to be observed on any Oracular
derivatives or backports. The kernel logs related to the topology of TF
amd-server (and eventual group hierarchy), where the issue was observed,
are as follows:
CPU topo: Max. logical packages: 2
CPU topo: Max. logical dies: 2
CPU topo: Max. dies per package: 1
CPU topo: Max. threads per core: 2
CPU topo: Num. cores per package: 16
CPU topo: Num. threads per package: 32
CPU topo: Allowing 64 present CPUs plus 0 hotplug CPUs
smpboot: x86: Booting SMP configuration:
.... node #0, CPUs: #1 #2 #3
.... node #1, CPUs: #4 #5 #6 #7
.... node #2, CPUs: #8 #9 #10 #11
.... node #3, CPUs: #12 #13 #14 #15
.... node #4, CPUs: #16 #17 #18 #19
.... node #5, CPUs: #20 #21 #22 #23
.... node #6, CPUs: #24 #25 #26 #27
.... node #7, CPUs: #28 #29 #30 #31
.... node #0, CPUs: #32 #33 #34 #35
.... node #1, CPUs: #36 #37 #38 #39
.... node #2, CPUs: #40 #41 #42 #43
.... node #3, CPUs: #44 #45 #46 #47
.... node #4, CPUs: #48 #49 #50 #51
.... node #5, CPUs: #52 #53 #54 #55
.... node #6, CPUs: #56 #57 #58 #59
.... node #7, CPUs: #60 #61 #62 #63
Timer migration: 2 hierarchy levels; 8 children per group; 1
crossnode level
The 2025.03.17 Oracular kernels (including derivatives and backports)
include commit b729cc1ec21a ("timers/migration: Fix another race between
hotplug and idle entry/exit") via the upstream stable patchset LP:
#2100328, while commit 868c9037df62 ("timers/migration: Fix off-by-one
root mis-connection") is not included. I've verified locally that with
the fix-the-fix commit 868c9037df62, the issue disappears.
** Affects: ubuntu-kernel-tests
Importance: Undecided
Status: New
** Tags: oracular sru-20250317 ubuntu-boot
** Description changed:
Found during boot testing of Noble linux-lowlatency-hwe-6.11
(6.11.0-1012.13~24.04.1) on TF amd-server.
Sample kernel warning message:
- WARNING: CPU: 0 PID: 1 at kernel/time/timer_migration.c:543 tmigr_requires_handle_remote+0x123/0x130
- Modules linked in:
- CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-1012-lowlatency #13~24.04.1-Ubuntu
- Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/07/2018
- RIP: 0010:tmigr_requires_handle_remote+0x123/0x130
- Code: 65 48 2b 14 25 28 00 00 00 75 23 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d 31 d2 31 c9 31 f6 31 ff e9 c1 84 07 01 0f 0b eb ba <0f> 0b eb a9 e8 44 5d 06 01 0f 1f 40 00 90 90 90 90 90 90 90 90 90
- RSP: 0018:ffffa6f9c0003f30 EFLAGS: 00010046
- RAX: 0000000000000000 RBX: ffff8c899f026200 RCX: 7fffffffffffffff
- RDX: ffff8c8240100e00 RSI: 0000000000000002 RDI: 0000000000000000
- RBP: ffffa6f9c0003f68 R08: 0000000000000000 R09: 0000000000000000
- R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
- R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
- FS: 0000000000000000(0000) GS:ffff8c899f000000(0000) knlGS:0000000000000000
- CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
- CR2: ffff8cc1bfdff000 CR3: 0000002dcb83e000 CR4: 00000000003506f0
- Call Trace:
+ WARNING: CPU: 0 PID: 1 at kernel/time/timer_migration.c:543 tmigr_requires_handle_remote+0x123/0x130
+ Modules linked in:
+ CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-1012-lowlatency #13~24.04.1-Ubuntu
+ Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/07/2018
+ RIP: 0010:tmigr_requires_handle_remote+0x123/0x130
+ Code: 65 48 2b 14 25 28 00 00 00 75 23 48 83 c4 10 5b 41 5c 41 5d 41 5e 41
+ 5f 5d 31 d2 31 c9 31 f6 31 ff e9 c1 84 07 01 0f 0b eb ba <0f> 0b eb a9 e8 44
+ 5d 06 01 0f 1f 40 00 90 90 90 90 90 90 90 90 90
+ RSP: 0018:ffffa6f9c0003f30 EFLAGS: 00010046
+ RAX: 0000000000000000 RBX: ffff8c899f026200 RCX: 7fffffffffffffff
+ RDX: ffff8c8240100e00 RSI: 0000000000000002 RDI: 0000000000000000
+ RBP: ffffa6f9c0003f68 R08: 0000000000000000 R09: 0000000000000000
+ R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
+ R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
+ FS: 0000000000000000(0000) GS:ffff8c899f000000(0000) knlGS:0000000000000000
+ CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+ CR2: ffff8cc1bfdff000 CR3: 0000002dcb83e000 CR4: 00000000003506f0
+ Call Trace:
- ? show_regs+0x6c/0x80
- ? __warn+0x88/0x140
- ? tmigr_requires_handle_remote+0x123/0x130
- ? report_bug+0x182/0x1b0
- ? handle_bug+0x6e/0xb0
- ? exc_invalid_op+0x18/0x80
- ? asm_exc_invalid_op+0x1b/0x20
- ? tmigr_requires_handle_remote+0x123/0x130
- update_process_times+0x63/0xb0
- tick_periodic+0x2d/0x90
- tick_handle_periodic+0x25/0x80
- __sysvec_apic_timer_interrupt+0x59/0x130
- sysvec_apic_timer_interrupt+0x9b/0xc0
- asm_sysvec_apic_timer_interrupt+0x1b/0x20
- RIP: 0010:delay_halt_mwaitx+0x3c/0x50
- Code: 05 91 3f 60 64 48 05 00 60 00 00 0f 01 fa b8 ff ff ff ff b9 02 00 00 00 48 39 c6 48 0f 46 c6 48 89 c3 b8 f0 00 00 00 0f 01 fb <48> 8b 5d f8 c9 31 c0 31 d2 31 c9 31 f6 e9 22 53 09 00 66 90 90 90
- RSP: 0018:ffffa6f9c007bbf8 EFLAGS: 00000293
- RAX: 00000000000000f0 RBX: 0000000000005d93 RCX: 0000000000000002
- RDX: 0000000000000000 RSI: 0000000000005d93 RDI: 00000035e3527498
- RBP: ffffa6f9c007bc00 R08: 0000000000000000 R09: 0000000000000000
- R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000005d93
- R13: 0000000000000005 R14: 0000000000000001 R15: 0000000000000020
- ? srso_return_thunk+0x5/0x5f
- delay_halt.part.0+0x3e/0x70
- delay_halt+0x13/0x30
- __const_udelay+0x3d/0x50
- wakeup_secondary_cpu_via_init+0xed/0x2e0
- do_boot_cpu+0x1d1/0x200
- native_kick_ap+0x111/0x1d0
- arch_cpuhp_kick_ap_alive+0x15/0x20
- cpuhp_kick_ap_alive+0x55/0x90
- ? __pfx_cpuhp_kick_ap_alive+0x10/0x10
- cpuhp_invoke_callback+0x340/0x520
- __cpuhp_invoke_callback_range+0x80/0x100
- _cpu_up+0x10b/0x280
- cpu_up+0xe3/0x120
- cpuhp_bringup_mask+0x71/0xd0
- cpuhp_bringup_cpus_parallel+0x116/0x150
- ? __pfx_kernel_init+0x10/0x10
- bringup_nonboot_cpus+0x22/0x50
- smp_init+0x2a/0x90
- kernel_init_freeable+0x10b/0x210
- kernel_init+0x1b/0x200
- ret_from_fork+0x47/0x70
- ? __pfx_kernel_init+0x10/0x10
- ret_from_fork_asm+0x1a/0x30
+ ? show_regs+0x6c/0x80
+ ? __warn+0x88/0x140
+ ? tmigr_requires_handle_remote+0x123/0x130
+ ? report_bug+0x182/0x1b0
+ ? handle_bug+0x6e/0xb0
+ ? exc_invalid_op+0x18/0x80
+ ? asm_exc_invalid_op+0x1b/0x20
+ ? tmigr_requires_handle_remote+0x123/0x130
+ update_process_times+0x63/0xb0
+ tick_periodic+0x2d/0x90
+ tick_handle_periodic+0x25/0x80
+ __sysvec_apic_timer_interrupt+0x59/0x130
+ sysvec_apic_timer_interrupt+0x9b/0xc0
+ asm_sysvec_apic_timer_interrupt+0x1b/0x20
+ RIP: 0010:delay_halt_mwaitx+0x3c/0x50
+ Code: 05 91 3f 60 64 48 05 00 60 00 00 0f 01 fa b8 ff ff ff ff b9 02 00 00
+ 00 48 39 c6 48 0f 46 c6 48 89 c3 b8 f0 00 00 00 0f 01 fb <48> 8b 5d f8 c9 31
+ c0 31 d2 31 c9 31 f6 e9 22 53 09 00 66 90 90 90
+ RSP: 0018:ffffa6f9c007bbf8 EFLAGS: 00000293
+ RAX: 00000000000000f0 RBX: 0000000000005d93 RCX: 0000000000000002
+ RDX: 0000000000000000 RSI: 0000000000005d93 RDI: 00000035e3527498
+ RBP: ffffa6f9c007bc00 R08: 0000000000000000 R09: 0000000000000000
+ R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000005d93
+ R13: 0000000000000005 R14: 0000000000000001 R15: 0000000000000020
+ ? srso_return_thunk+0x5/0x5f
+ delay_halt.part.0+0x3e/0x70
+ delay_halt+0x13/0x30
+ __const_udelay+0x3d/0x50
+ wakeup_secondary_cpu_via_init+0xed/0x2e0
+ do_boot_cpu+0x1d1/0x200
+ native_kick_ap+0x111/0x1d0
+ arch_cpuhp_kick_ap_alive+0x15/0x20
+ cpuhp_kick_ap_alive+0x55/0x90
+ ? __pfx_cpuhp_kick_ap_alive+0x10/0x10
+ cpuhp_invoke_callback+0x340/0x520
+ __cpuhp_invoke_callback_range+0x80/0x100
+ _cpu_up+0x10b/0x280
+ cpu_up+0xe3/0x120
+ cpuhp_bringup_mask+0x71/0xd0
+ cpuhp_bringup_cpus_parallel+0x116/0x150
+ ? __pfx_kernel_init+0x10/0x10
+ bringup_nonboot_cpus+0x22/0x50
+ smp_init+0x2a/0x90
+ kernel_init_freeable+0x10b/0x210
+ kernel_init+0x1b/0x200
+ ret_from_fork+0x47/0x70
+ ? __pfx_kernel_init+0x10/0x10
+ ret_from_fork_asm+0x1a/0x30
- ---[ end trace 0000000000000000 ]---
-
+ ---[ end trace 0000000000000000 ]---
This issue can be reproduced with oracular/linux, at least with the same
tmigr_group hierarchy, so it is likely to be observed on any Oracular
derivatives or backports. The kernel logs related to the topology of TF
amd-server (and eventual group hierarchy), where the issue was observed,
are as follows:
- CPU topo: Max. logical packages: 2
- CPU topo: Max. logical dies: 2
- CPU topo: Max. dies per package: 1
- CPU topo: Max. threads per core: 2
- CPU topo: Num. cores per package: 16
- CPU topo: Num. threads per package: 32
- CPU topo: Allowing 64 present CPUs plus 0 hotplug CPUs
+ CPU topo: Max. logical packages: 2
+ CPU topo: Max. logical dies: 2
+ CPU topo: Max. dies per package: 1
+ CPU topo: Max. threads per core: 2
+ CPU topo: Num. cores per package: 16
+ CPU topo: Num. threads per package: 32
+ CPU topo: Allowing 64 present CPUs plus 0 hotplug CPUs
- smpboot: x86: Booting SMP configuration:
- .... node #0, CPUs: #1 #2 #3
- .... node #1, CPUs: #4 #5 #6 #7
- .... node #2, CPUs: #8 #9 #10 #11
- .... node #3, CPUs: #12 #13 #14 #15
- .... node #4, CPUs: #16 #17 #18 #19
- .... node #5, CPUs: #20 #21 #22 #23
- .... node #6, CPUs: #24 #25 #26 #27
- .... node #7, CPUs: #28 #29 #30 #31
- .... node #0, CPUs: #32 #33 #34 #35
- .... node #1, CPUs: #36 #37 #38 #39
- .... node #2, CPUs: #40 #41 #42 #43
- .... node #3, CPUs: #44 #45 #46 #47
- .... node #4, CPUs: #48 #49 #50 #51
- .... node #5, CPUs: #52 #53 #54 #55
- .... node #6, CPUs: #56 #57 #58 #59
- .... node #7, CPUs: #60 #61 #62 #63
+ smpboot: x86: Booting SMP configuration:
+ .... node #0, CPUs: #1 #2 #3
+ .... node #1, CPUs: #4 #5 #6 #7
+ .... node #2, CPUs: #8 #9 #10 #11
+ .... node #3, CPUs: #12 #13 #14 #15
+ .... node #4, CPUs: #16 #17 #18 #19
+ .... node #5, CPUs: #20 #21 #22 #23
+ .... node #6, CPUs: #24 #25 #26 #27
+ .... node #7, CPUs: #28 #29 #30 #31
+ .... node #0, CPUs: #32 #33 #34 #35
+ .... node #1, CPUs: #36 #37 #38 #39
+ .... node #2, CPUs: #40 #41 #42 #43
+ .... node #3, CPUs: #44 #45 #46 #47
+ .... node #4, CPUs: #48 #49 #50 #51
+ .... node #5, CPUs: #52 #53 #54 #55
+ .... node #6, CPUs: #56 #57 #58 #59
+ .... node #7, CPUs: #60 #61 #62 #63
- Timer migration: 2 hierarchy levels; 8 children per group; 1
+ Timer migration: 2 hierarchy levels; 8 children per group; 1
crossnode level
-
- The 2025.03.17 Oracular kernels (including derivatives and backports) include commit b729cc1ec21a ('timers/migration: Fix another race between hotplug and idle entry/exit') via the upstream stable patchset LP: #2100328, while commit 868c9037df62 ('timers/migration: Fix off-by-one root mis-connection') is not included. I've verified locally that with the fix-the-fix commit 868c9037df62, the issue disappears.
+ The 2025.03.17 Oracular kernels (including derivatives and backports)
+ include commit b729cc1ec21a ("timers/migration: Fix another race between
+ hotplug and idle entry/exit") via the upstream stable patchset LP:
+ #2100328, while commit 868c9037df62 ("timers/migration: Fix off-by-one
+ root mis-connection") is not included. I've verified locally that with
+ the fix-the-fix commit 868c9037df62, the issue disappears.
** Tags added: sru-20250317
--
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2106022
Title:
log_check/kernel_tainted failed with kernel warnings at
kernel/time/timer_migration.c:543 on Oracular
Status in ubuntu-kernel-tests:
New
Bug description:
Found during boot testing of Noble linux-lowlatency-hwe-6.11
(6.11.0-1012.13~24.04.1) on TF amd-server.
Sample kernel warning message:
WARNING: CPU: 0 PID: 1 at kernel/time/timer_migration.c:543 tmigr_requires_handle_remote+0x123/0x130
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-1012-lowlatency #13~24.04.1-Ubuntu
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/07/2018
RIP: 0010:tmigr_requires_handle_remote+0x123/0x130
Code: 65 48 2b 14 25 28 00 00 00 75 23 48 83 c4 10 5b 41 5c 41 5d 41 5e 41
5f 5d 31 d2 31 c9 31 f6 31 ff e9 c1 84 07 01 0f 0b eb ba <0f> 0b eb a9 e8 44
5d 06 01 0f 1f 40 00 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffa6f9c0003f30 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff8c899f026200 RCX: 7fffffffffffffff
RDX: ffff8c8240100e00 RSI: 0000000000000002 RDI: 0000000000000000
RBP: ffffa6f9c0003f68 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff8c899f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8cc1bfdff000 CR3: 0000002dcb83e000 CR4: 00000000003506f0
Call Trace:
? show_regs+0x6c/0x80
? __warn+0x88/0x140
? tmigr_requires_handle_remote+0x123/0x130
? report_bug+0x182/0x1b0
? handle_bug+0x6e/0xb0
? exc_invalid_op+0x18/0x80
? asm_exc_invalid_op+0x1b/0x20
? tmigr_requires_handle_remote+0x123/0x130
update_process_times+0x63/0xb0
tick_periodic+0x2d/0x90
tick_handle_periodic+0x25/0x80
__sysvec_apic_timer_interrupt+0x59/0x130
sysvec_apic_timer_interrupt+0x9b/0xc0
asm_sysvec_apic_timer_interrupt+0x1b/0x20
RIP: 0010:delay_halt_mwaitx+0x3c/0x50
Code: 05 91 3f 60 64 48 05 00 60 00 00 0f 01 fa b8 ff ff ff ff b9 02 00 00
00 48 39 c6 48 0f 46 c6 48 89 c3 b8 f0 00 00 00 0f 01 fb <48> 8b 5d f8 c9 31
c0 31 d2 31 c9 31 f6 e9 22 53 09 00 66 90 90 90
RSP: 0018:ffffa6f9c007bbf8 EFLAGS: 00000293
RAX: 00000000000000f0 RBX: 0000000000005d93 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 0000000000005d93 RDI: 00000035e3527498
RBP: ffffa6f9c007bc00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000005d93
R13: 0000000000000005 R14: 0000000000000001 R15: 0000000000000020
? srso_return_thunk+0x5/0x5f
delay_halt.part.0+0x3e/0x70
delay_halt+0x13/0x30
__const_udelay+0x3d/0x50
wakeup_secondary_cpu_via_init+0xed/0x2e0
do_boot_cpu+0x1d1/0x200
native_kick_ap+0x111/0x1d0
arch_cpuhp_kick_ap_alive+0x15/0x20
cpuhp_kick_ap_alive+0x55/0x90
? __pfx_cpuhp_kick_ap_alive+0x10/0x10
cpuhp_invoke_callback+0x340/0x520
__cpuhp_invoke_callback_range+0x80/0x100
_cpu_up+0x10b/0x280
cpu_up+0xe3/0x120
cpuhp_bringup_mask+0x71/0xd0
cpuhp_bringup_cpus_parallel+0x116/0x150
? __pfx_kernel_init+0x10/0x10
bringup_nonboot_cpus+0x22/0x50
smp_init+0x2a/0x90
kernel_init_freeable+0x10b/0x210
kernel_init+0x1b/0x200
ret_from_fork+0x47/0x70
? __pfx_kernel_init+0x10/0x10
ret_from_fork_asm+0x1a/0x30
---[ end trace 0000000000000000 ]---
This issue can be reproduced with oracular/linux, at least with the
same tmigr_group hierarchy, so it is likely to be observed on any
Oracular derivatives or backports. The kernel logs related to the
topology of TF amd-server (and eventual group hierarchy), where the
issue was observed, are as follows:
CPU topo: Max. logical packages: 2
CPU topo: Max. logical dies: 2
CPU topo: Max. dies per package: 1
CPU topo: Max. threads per core: 2
CPU topo: Num. cores per package: 16
CPU topo: Num. threads per package: 32
CPU topo: Allowing 64 present CPUs plus 0 hotplug CPUs
smpboot: x86: Booting SMP configuration:
.... node #0, CPUs: #1 #2 #3
.... node #1, CPUs: #4 #5 #6 #7
.... node #2, CPUs: #8 #9 #10 #11
.... node #3, CPUs: #12 #13 #14 #15
.... node #4, CPUs: #16 #17 #18 #19
.... node #5, CPUs: #20 #21 #22 #23
.... node #6, CPUs: #24 #25 #26 #27
.... node #7, CPUs: #28 #29 #30 #31
.... node #0, CPUs: #32 #33 #34 #35
.... node #1, CPUs: #36 #37 #38 #39
.... node #2, CPUs: #40 #41 #42 #43
.... node #3, CPUs: #44 #45 #46 #47
.... node #4, CPUs: #48 #49 #50 #51
.... node #5, CPUs: #52 #53 #54 #55
.... node #6, CPUs: #56 #57 #58 #59
.... node #7, CPUs: #60 #61 #62 #63
Timer migration: 2 hierarchy levels; 8 children per group; 1
crossnode level
The 2025.03.17 Oracular kernels (including derivatives and backports)
include commit b729cc1ec21a ("timers/migration: Fix another race
between hotplug and idle entry/exit") via the upstream stable patchset
LP: #2100328, while commit 868c9037df62 ("timers/migration: Fix off-
by-one root mis-connection") is not included. I've verified locally
that with the fix-the-fix commit 868c9037df62, the issue disappears.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2106022/+subscriptions
Follow ups