← Back to team overview

canonical-ubuntu-qa team mailing list archive

[Bug 2106022] [NEW] log_check/kernel_tainted failed with kernel warnings at kernel/time/timer_migration.c:543 on Oracular

 

Public bug reported:

Found during boot testing of Noble linux-lowlatency-hwe-6.11
(6.11.0-1012.13~24.04.1) on TF amd-server.

Sample kernel warning message:

    WARNING: CPU: 0 PID: 1 at kernel/time/timer_migration.c:543 tmigr_requires_handle_remote+0x123/0x130
    Modules linked in:
    CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-1012-lowlatency #13~24.04.1-Ubuntu
    Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/07/2018
    RIP: 0010:tmigr_requires_handle_remote+0x123/0x130
    Code: 65 48 2b 14 25 28 00 00 00 75 23 48 83 c4 10 5b 41 5c 41 5d 41 5e 41
    5f 5d 31 d2 31 c9 31 f6 31 ff e9 c1 84 07 01 0f 0b eb ba <0f> 0b eb a9 e8 44
    5d 06 01 0f 1f 40 00 90 90 90 90 90 90 90 90 90
    RSP: 0018:ffffa6f9c0003f30 EFLAGS: 00010046
    RAX: 0000000000000000 RBX: ffff8c899f026200 RCX: 7fffffffffffffff
    RDX: ffff8c8240100e00 RSI: 0000000000000002 RDI: 0000000000000000
    RBP: ffffa6f9c0003f68 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
    FS:  0000000000000000(0000) GS:ffff8c899f000000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff8cc1bfdff000 CR3: 0000002dcb83e000 CR4: 00000000003506f0
    Call Trace:

     ? show_regs+0x6c/0x80
     ? __warn+0x88/0x140
     ? tmigr_requires_handle_remote+0x123/0x130
     ? report_bug+0x182/0x1b0
     ? handle_bug+0x6e/0xb0
     ? exc_invalid_op+0x18/0x80
     ? asm_exc_invalid_op+0x1b/0x20
     ? tmigr_requires_handle_remote+0x123/0x130
     update_process_times+0x63/0xb0
     tick_periodic+0x2d/0x90
     tick_handle_periodic+0x25/0x80
     __sysvec_apic_timer_interrupt+0x59/0x130
     sysvec_apic_timer_interrupt+0x9b/0xc0
     asm_sysvec_apic_timer_interrupt+0x1b/0x20
    RIP: 0010:delay_halt_mwaitx+0x3c/0x50
    Code: 05 91 3f 60 64 48 05 00 60 00 00 0f 01 fa b8 ff ff ff ff b9 02 00 00
    00 48 39 c6 48 0f 46 c6 48 89 c3 b8 f0 00 00 00 0f 01 fb <48> 8b 5d f8 c9 31
    c0 31 d2 31 c9 31 f6 e9 22 53 09 00 66 90 90 90
    RSP: 0018:ffffa6f9c007bbf8 EFLAGS: 00000293
    RAX: 00000000000000f0 RBX: 0000000000005d93 RCX: 0000000000000002
    RDX: 0000000000000000 RSI: 0000000000005d93 RDI: 00000035e3527498
    RBP: ffffa6f9c007bc00 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000005d93
    R13: 0000000000000005 R14: 0000000000000001 R15: 0000000000000020
     ? srso_return_thunk+0x5/0x5f
     delay_halt.part.0+0x3e/0x70
     delay_halt+0x13/0x30
     __const_udelay+0x3d/0x50
     wakeup_secondary_cpu_via_init+0xed/0x2e0
     do_boot_cpu+0x1d1/0x200
     native_kick_ap+0x111/0x1d0
     arch_cpuhp_kick_ap_alive+0x15/0x20
     cpuhp_kick_ap_alive+0x55/0x90
     ? __pfx_cpuhp_kick_ap_alive+0x10/0x10
     cpuhp_invoke_callback+0x340/0x520
     __cpuhp_invoke_callback_range+0x80/0x100
     _cpu_up+0x10b/0x280
     cpu_up+0xe3/0x120
     cpuhp_bringup_mask+0x71/0xd0
     cpuhp_bringup_cpus_parallel+0x116/0x150
     ? __pfx_kernel_init+0x10/0x10
     bringup_nonboot_cpus+0x22/0x50
     smp_init+0x2a/0x90
     kernel_init_freeable+0x10b/0x210
     kernel_init+0x1b/0x200
     ret_from_fork+0x47/0x70
     ? __pfx_kernel_init+0x10/0x10
     ret_from_fork_asm+0x1a/0x30

    ---[ end trace 0000000000000000 ]---

This issue can be reproduced with oracular/linux, at least with the same
tmigr_group hierarchy, so it is likely to be observed on any Oracular
derivatives or backports. The kernel logs related to the topology of TF
amd-server (and eventual group hierarchy), where the issue was observed,
are as follows:

    CPU topo: Max. logical packages:   2
    CPU topo: Max. logical dies:       2
    CPU topo: Max. dies per package:   1
    CPU topo: Max. threads per core:   2
    CPU topo: Num. cores per package:    16
    CPU topo: Num. threads per package:  32
    CPU topo: Allowing 64 present CPUs plus 0 hotplug CPUs

    smpboot: x86: Booting SMP configuration:
    .... node  #0, CPUs:        #1  #2  #3
    .... node  #1, CPUs:    #4  #5  #6  #7
    .... node  #2, CPUs:    #8  #9 #10 #11
    .... node  #3, CPUs:   #12 #13 #14 #15
    .... node  #4, CPUs:   #16 #17 #18 #19
    .... node  #5, CPUs:   #20 #21 #22 #23
    .... node  #6, CPUs:   #24 #25 #26 #27
    .... node  #7, CPUs:   #28 #29 #30 #31
    .... node  #0, CPUs:   #32 #33 #34 #35
    .... node  #1, CPUs:   #36 #37 #38 #39
    .... node  #2, CPUs:   #40 #41 #42 #43
    .... node  #3, CPUs:   #44 #45 #46 #47
    .... node  #4, CPUs:   #48 #49 #50 #51
    .... node  #5, CPUs:   #52 #53 #54 #55
    .... node  #6, CPUs:   #56 #57 #58 #59
    .... node  #7, CPUs:   #60 #61 #62 #63

    Timer migration: 2 hierarchy levels; 8 children per group; 1
crossnode level

The 2025.03.17 Oracular kernels (including derivatives and backports)
include commit b729cc1ec21a ("timers/migration: Fix another race between
hotplug and idle entry/exit") via the upstream stable patchset LP:
#2100328, while commit 868c9037df62 ("timers/migration: Fix off-by-one
root mis-connection") is not included. I've verified locally that with
the fix-the-fix commit 868c9037df62, the issue disappears.

** Affects: ubuntu-kernel-tests
     Importance: Undecided
         Status: New


** Tags: oracular sru-20250317 ubuntu-boot

** Description changed:

  Found during boot testing of Noble linux-lowlatency-hwe-6.11
  (6.11.0-1012.13~24.04.1) on TF amd-server.
  
  Sample kernel warning message:
  
-     WARNING: CPU: 0 PID: 1 at kernel/time/timer_migration.c:543 tmigr_requires_handle_remote+0x123/0x130                                                                                                                                                                                                                                                                                      
-     Modules linked in:
-     CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-1012-lowlatency #13~24.04.1-Ubuntu
-     Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/07/2018
-     RIP: 0010:tmigr_requires_handle_remote+0x123/0x130
-     Code: 65 48 2b 14 25 28 00 00 00 75 23 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d 31 d2 31 c9 31 f6 31 ff e9 c1 84 07 01 0f 0b eb ba <0f> 0b eb a9 e8 44 5d 06 01 0f 1f 40 00 90 90 90 90 90 90 90 90 90
-     RSP: 0018:ffffa6f9c0003f30 EFLAGS: 00010046
-     RAX: 0000000000000000 RBX: ffff8c899f026200 RCX: 7fffffffffffffff
-     RDX: ffff8c8240100e00 RSI: 0000000000000002 RDI: 0000000000000000
-     RBP: ffffa6f9c0003f68 R08: 0000000000000000 R09: 0000000000000000
-     R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
-     R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
-     FS:  0000000000000000(0000) GS:ffff8c899f000000(0000) knlGS:0000000000000000
-     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
-     CR2: ffff8cc1bfdff000 CR3: 0000002dcb83e000 CR4: 00000000003506f0
-     Call Trace:
+     WARNING: CPU: 0 PID: 1 at kernel/time/timer_migration.c:543 tmigr_requires_handle_remote+0x123/0x130
+     Modules linked in:
+     CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-1012-lowlatency #13~24.04.1-Ubuntu
+     Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/07/2018
+     RIP: 0010:tmigr_requires_handle_remote+0x123/0x130
+     Code: 65 48 2b 14 25 28 00 00 00 75 23 48 83 c4 10 5b 41 5c 41 5d 41 5e 41
+     5f 5d 31 d2 31 c9 31 f6 31 ff e9 c1 84 07 01 0f 0b eb ba <0f> 0b eb a9 e8 44
+     5d 06 01 0f 1f 40 00 90 90 90 90 90 90 90 90 90
+     RSP: 0018:ffffa6f9c0003f30 EFLAGS: 00010046
+     RAX: 0000000000000000 RBX: ffff8c899f026200 RCX: 7fffffffffffffff
+     RDX: ffff8c8240100e00 RSI: 0000000000000002 RDI: 0000000000000000
+     RBP: ffffa6f9c0003f68 R08: 0000000000000000 R09: 0000000000000000
+     R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
+     R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
+     FS:  0000000000000000(0000) GS:ffff8c899f000000(0000) knlGS:0000000000000000
+     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+     CR2: ffff8cc1bfdff000 CR3: 0000002dcb83e000 CR4: 00000000003506f0
+     Call Trace:
  
-      ? show_regs+0x6c/0x80
-      ? __warn+0x88/0x140
-      ? tmigr_requires_handle_remote+0x123/0x130
-      ? report_bug+0x182/0x1b0
-      ? handle_bug+0x6e/0xb0
-      ? exc_invalid_op+0x18/0x80
-      ? asm_exc_invalid_op+0x1b/0x20
-      ? tmigr_requires_handle_remote+0x123/0x130
-      update_process_times+0x63/0xb0
-      tick_periodic+0x2d/0x90
-      tick_handle_periodic+0x25/0x80
-      __sysvec_apic_timer_interrupt+0x59/0x130
-      sysvec_apic_timer_interrupt+0x9b/0xc0
-      asm_sysvec_apic_timer_interrupt+0x1b/0x20
-     RIP: 0010:delay_halt_mwaitx+0x3c/0x50
-     Code: 05 91 3f 60 64 48 05 00 60 00 00 0f 01 fa b8 ff ff ff ff b9 02 00 00 00 48 39 c6 48 0f 46 c6 48 89 c3 b8 f0 00 00 00 0f 01 fb <48> 8b 5d f8 c9 31 c0 31 d2 31 c9 31 f6 e9 22 53 09 00 66 90 90 90
-     RSP: 0018:ffffa6f9c007bbf8 EFLAGS: 00000293
-     RAX: 00000000000000f0 RBX: 0000000000005d93 RCX: 0000000000000002
-     RDX: 0000000000000000 RSI: 0000000000005d93 RDI: 00000035e3527498
-     RBP: ffffa6f9c007bc00 R08: 0000000000000000 R09: 0000000000000000
-     R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000005d93
-     R13: 0000000000000005 R14: 0000000000000001 R15: 0000000000000020
-      ? srso_return_thunk+0x5/0x5f
-      delay_halt.part.0+0x3e/0x70
-      delay_halt+0x13/0x30
-      __const_udelay+0x3d/0x50
-      wakeup_secondary_cpu_via_init+0xed/0x2e0
-      do_boot_cpu+0x1d1/0x200
-      native_kick_ap+0x111/0x1d0
-      arch_cpuhp_kick_ap_alive+0x15/0x20
-      cpuhp_kick_ap_alive+0x55/0x90
-      ? __pfx_cpuhp_kick_ap_alive+0x10/0x10
-      cpuhp_invoke_callback+0x340/0x520
-      __cpuhp_invoke_callback_range+0x80/0x100
-      _cpu_up+0x10b/0x280
-      cpu_up+0xe3/0x120
-      cpuhp_bringup_mask+0x71/0xd0
-      cpuhp_bringup_cpus_parallel+0x116/0x150
-      ? __pfx_kernel_init+0x10/0x10
-      bringup_nonboot_cpus+0x22/0x50
-      smp_init+0x2a/0x90
-      kernel_init_freeable+0x10b/0x210
-      kernel_init+0x1b/0x200
-      ret_from_fork+0x47/0x70
-      ? __pfx_kernel_init+0x10/0x10
-      ret_from_fork_asm+0x1a/0x30
+      ? show_regs+0x6c/0x80
+      ? __warn+0x88/0x140
+      ? tmigr_requires_handle_remote+0x123/0x130
+      ? report_bug+0x182/0x1b0
+      ? handle_bug+0x6e/0xb0
+      ? exc_invalid_op+0x18/0x80
+      ? asm_exc_invalid_op+0x1b/0x20
+      ? tmigr_requires_handle_remote+0x123/0x130
+      update_process_times+0x63/0xb0
+      tick_periodic+0x2d/0x90
+      tick_handle_periodic+0x25/0x80
+      __sysvec_apic_timer_interrupt+0x59/0x130
+      sysvec_apic_timer_interrupt+0x9b/0xc0
+      asm_sysvec_apic_timer_interrupt+0x1b/0x20
+     RIP: 0010:delay_halt_mwaitx+0x3c/0x50
+     Code: 05 91 3f 60 64 48 05 00 60 00 00 0f 01 fa b8 ff ff ff ff b9 02 00 00
+     00 48 39 c6 48 0f 46 c6 48 89 c3 b8 f0 00 00 00 0f 01 fb <48> 8b 5d f8 c9 31
+     c0 31 d2 31 c9 31 f6 e9 22 53 09 00 66 90 90 90
+     RSP: 0018:ffffa6f9c007bbf8 EFLAGS: 00000293
+     RAX: 00000000000000f0 RBX: 0000000000005d93 RCX: 0000000000000002
+     RDX: 0000000000000000 RSI: 0000000000005d93 RDI: 00000035e3527498
+     RBP: ffffa6f9c007bc00 R08: 0000000000000000 R09: 0000000000000000
+     R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000005d93
+     R13: 0000000000000005 R14: 0000000000000001 R15: 0000000000000020
+      ? srso_return_thunk+0x5/0x5f
+      delay_halt.part.0+0x3e/0x70
+      delay_halt+0x13/0x30
+      __const_udelay+0x3d/0x50
+      wakeup_secondary_cpu_via_init+0xed/0x2e0
+      do_boot_cpu+0x1d1/0x200
+      native_kick_ap+0x111/0x1d0
+      arch_cpuhp_kick_ap_alive+0x15/0x20
+      cpuhp_kick_ap_alive+0x55/0x90
+      ? __pfx_cpuhp_kick_ap_alive+0x10/0x10
+      cpuhp_invoke_callback+0x340/0x520
+      __cpuhp_invoke_callback_range+0x80/0x100
+      _cpu_up+0x10b/0x280
+      cpu_up+0xe3/0x120
+      cpuhp_bringup_mask+0x71/0xd0
+      cpuhp_bringup_cpus_parallel+0x116/0x150
+      ? __pfx_kernel_init+0x10/0x10
+      bringup_nonboot_cpus+0x22/0x50
+      smp_init+0x2a/0x90
+      kernel_init_freeable+0x10b/0x210
+      kernel_init+0x1b/0x200
+      ret_from_fork+0x47/0x70
+      ? __pfx_kernel_init+0x10/0x10
+      ret_from_fork_asm+0x1a/0x30
  
-     ---[ end trace 0000000000000000 ]---
- 
+     ---[ end trace 0000000000000000 ]---
  
  This issue can be reproduced with oracular/linux, at least with the same
  tmigr_group hierarchy, so it is likely to be observed on any Oracular
  derivatives or backports. The kernel logs related to the topology of TF
  amd-server (and eventual group hierarchy), where the issue was observed,
  are as follows:
  
-     CPU topo: Max. logical packages:   2   
-     CPU topo: Max. logical dies:       2   
-     CPU topo: Max. dies per package:   1   
-     CPU topo: Max. threads per core:   2   
-     CPU topo: Num. cores per package:    16  
-     CPU topo: Num. threads per package:  32  
-     CPU topo: Allowing 64 present CPUs plus 0 hotplug CPUs
+     CPU topo: Max. logical packages:   2
+     CPU topo: Max. logical dies:       2
+     CPU topo: Max. dies per package:   1
+     CPU topo: Max. threads per core:   2
+     CPU topo: Num. cores per package:    16
+     CPU topo: Num. threads per package:  32
+     CPU topo: Allowing 64 present CPUs plus 0 hotplug CPUs
  
-     smpboot: x86: Booting SMP configuration:
-     .... node  #0, CPUs:        #1  #2  #3                                                                              
-     .... node  #1, CPUs:    #4  #5  #6  #7
-     .... node  #2, CPUs:    #8  #9 #10 #11
-     .... node  #3, CPUs:   #12 #13 #14 #15
-     .... node  #4, CPUs:   #16 #17 #18 #19
-     .... node  #5, CPUs:   #20 #21 #22 #23
-     .... node  #6, CPUs:   #24 #25 #26 #27
-     .... node  #7, CPUs:   #28 #29 #30 #31
-     .... node  #0, CPUs:   #32 #33 #34 #35
-     .... node  #1, CPUs:   #36 #37 #38 #39
-     .... node  #2, CPUs:   #40 #41 #42 #43
-     .... node  #3, CPUs:   #44 #45 #46 #47
-     .... node  #4, CPUs:   #48 #49 #50 #51
-     .... node  #5, CPUs:   #52 #53 #54 #55
-     .... node  #6, CPUs:   #56 #57 #58 #59
-     .... node  #7, CPUs:   #60 #61 #62 #63
+     smpboot: x86: Booting SMP configuration:
+     .... node  #0, CPUs:        #1  #2  #3
+     .... node  #1, CPUs:    #4  #5  #6  #7
+     .... node  #2, CPUs:    #8  #9 #10 #11
+     .... node  #3, CPUs:   #12 #13 #14 #15
+     .... node  #4, CPUs:   #16 #17 #18 #19
+     .... node  #5, CPUs:   #20 #21 #22 #23
+     .... node  #6, CPUs:   #24 #25 #26 #27
+     .... node  #7, CPUs:   #28 #29 #30 #31
+     .... node  #0, CPUs:   #32 #33 #34 #35
+     .... node  #1, CPUs:   #36 #37 #38 #39
+     .... node  #2, CPUs:   #40 #41 #42 #43
+     .... node  #3, CPUs:   #44 #45 #46 #47
+     .... node  #4, CPUs:   #48 #49 #50 #51
+     .... node  #5, CPUs:   #52 #53 #54 #55
+     .... node  #6, CPUs:   #56 #57 #58 #59
+     .... node  #7, CPUs:   #60 #61 #62 #63
  
-     Timer migration: 2 hierarchy levels; 8 children per group; 1
+     Timer migration: 2 hierarchy levels; 8 children per group; 1
  crossnode level
  
- 
- The 2025.03.17 Oracular kernels (including derivatives and backports) include commit b729cc1ec21a ('timers/migration: Fix another race between hotplug and idle entry/exit') via the upstream stable patchset LP: #2100328, while commit 868c9037df62 ('timers/migration: Fix off-by-one root mis-connection') is not included. I've verified locally that with the fix-the-fix commit 868c9037df62, the issue disappears.
+ The 2025.03.17 Oracular kernels (including derivatives and backports)
+ include commit b729cc1ec21a ("timers/migration: Fix another race between
+ hotplug and idle entry/exit") via the upstream stable patchset LP:
+ #2100328, while commit 868c9037df62 ("timers/migration: Fix off-by-one
+ root mis-connection") is not included. I've verified locally that with
+ the fix-the-fix commit 868c9037df62, the issue disappears.

** Tags added: sru-20250317

-- 
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2106022

Title:
  log_check/kernel_tainted failed with kernel warnings at
  kernel/time/timer_migration.c:543 on Oracular

Status in ubuntu-kernel-tests:
  New

Bug description:
  Found during boot testing of Noble linux-lowlatency-hwe-6.11
  (6.11.0-1012.13~24.04.1) on TF amd-server.

  Sample kernel warning message:

      WARNING: CPU: 0 PID: 1 at kernel/time/timer_migration.c:543 tmigr_requires_handle_remote+0x123/0x130
      Modules linked in:
      CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-1012-lowlatency #13~24.04.1-Ubuntu
      Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/07/2018
      RIP: 0010:tmigr_requires_handle_remote+0x123/0x130
      Code: 65 48 2b 14 25 28 00 00 00 75 23 48 83 c4 10 5b 41 5c 41 5d 41 5e 41
      5f 5d 31 d2 31 c9 31 f6 31 ff e9 c1 84 07 01 0f 0b eb ba <0f> 0b eb a9 e8 44
      5d 06 01 0f 1f 40 00 90 90 90 90 90 90 90 90 90
      RSP: 0018:ffffa6f9c0003f30 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: ffff8c899f026200 RCX: 7fffffffffffffff
      RDX: ffff8c8240100e00 RSI: 0000000000000002 RDI: 0000000000000000
      RBP: ffffa6f9c0003f68 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
      FS:  0000000000000000(0000) GS:ffff8c899f000000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff8cc1bfdff000 CR3: 0000002dcb83e000 CR4: 00000000003506f0
      Call Trace:

       ? show_regs+0x6c/0x80
       ? __warn+0x88/0x140
       ? tmigr_requires_handle_remote+0x123/0x130
       ? report_bug+0x182/0x1b0
       ? handle_bug+0x6e/0xb0
       ? exc_invalid_op+0x18/0x80
       ? asm_exc_invalid_op+0x1b/0x20
       ? tmigr_requires_handle_remote+0x123/0x130
       update_process_times+0x63/0xb0
       tick_periodic+0x2d/0x90
       tick_handle_periodic+0x25/0x80
       __sysvec_apic_timer_interrupt+0x59/0x130
       sysvec_apic_timer_interrupt+0x9b/0xc0
       asm_sysvec_apic_timer_interrupt+0x1b/0x20
      RIP: 0010:delay_halt_mwaitx+0x3c/0x50
      Code: 05 91 3f 60 64 48 05 00 60 00 00 0f 01 fa b8 ff ff ff ff b9 02 00 00
      00 48 39 c6 48 0f 46 c6 48 89 c3 b8 f0 00 00 00 0f 01 fb <48> 8b 5d f8 c9 31
      c0 31 d2 31 c9 31 f6 e9 22 53 09 00 66 90 90 90
      RSP: 0018:ffffa6f9c007bbf8 EFLAGS: 00000293
      RAX: 00000000000000f0 RBX: 0000000000005d93 RCX: 0000000000000002
      RDX: 0000000000000000 RSI: 0000000000005d93 RDI: 00000035e3527498
      RBP: ffffa6f9c007bc00 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000005d93
      R13: 0000000000000005 R14: 0000000000000001 R15: 0000000000000020
       ? srso_return_thunk+0x5/0x5f
       delay_halt.part.0+0x3e/0x70
       delay_halt+0x13/0x30
       __const_udelay+0x3d/0x50
       wakeup_secondary_cpu_via_init+0xed/0x2e0
       do_boot_cpu+0x1d1/0x200
       native_kick_ap+0x111/0x1d0
       arch_cpuhp_kick_ap_alive+0x15/0x20
       cpuhp_kick_ap_alive+0x55/0x90
       ? __pfx_cpuhp_kick_ap_alive+0x10/0x10
       cpuhp_invoke_callback+0x340/0x520
       __cpuhp_invoke_callback_range+0x80/0x100
       _cpu_up+0x10b/0x280
       cpu_up+0xe3/0x120
       cpuhp_bringup_mask+0x71/0xd0
       cpuhp_bringup_cpus_parallel+0x116/0x150
       ? __pfx_kernel_init+0x10/0x10
       bringup_nonboot_cpus+0x22/0x50
       smp_init+0x2a/0x90
       kernel_init_freeable+0x10b/0x210
       kernel_init+0x1b/0x200
       ret_from_fork+0x47/0x70
       ? __pfx_kernel_init+0x10/0x10
       ret_from_fork_asm+0x1a/0x30

      ---[ end trace 0000000000000000 ]---

  This issue can be reproduced with oracular/linux, at least with the
  same tmigr_group hierarchy, so it is likely to be observed on any
  Oracular derivatives or backports. The kernel logs related to the
  topology of TF amd-server (and eventual group hierarchy), where the
  issue was observed, are as follows:

      CPU topo: Max. logical packages:   2
      CPU topo: Max. logical dies:       2
      CPU topo: Max. dies per package:   1
      CPU topo: Max. threads per core:   2
      CPU topo: Num. cores per package:    16
      CPU topo: Num. threads per package:  32
      CPU topo: Allowing 64 present CPUs plus 0 hotplug CPUs

      smpboot: x86: Booting SMP configuration:
      .... node  #0, CPUs:        #1  #2  #3
      .... node  #1, CPUs:    #4  #5  #6  #7
      .... node  #2, CPUs:    #8  #9 #10 #11
      .... node  #3, CPUs:   #12 #13 #14 #15
      .... node  #4, CPUs:   #16 #17 #18 #19
      .... node  #5, CPUs:   #20 #21 #22 #23
      .... node  #6, CPUs:   #24 #25 #26 #27
      .... node  #7, CPUs:   #28 #29 #30 #31
      .... node  #0, CPUs:   #32 #33 #34 #35
      .... node  #1, CPUs:   #36 #37 #38 #39
      .... node  #2, CPUs:   #40 #41 #42 #43
      .... node  #3, CPUs:   #44 #45 #46 #47
      .... node  #4, CPUs:   #48 #49 #50 #51
      .... node  #5, CPUs:   #52 #53 #54 #55
      .... node  #6, CPUs:   #56 #57 #58 #59
      .... node  #7, CPUs:   #60 #61 #62 #63

      Timer migration: 2 hierarchy levels; 8 children per group; 1
  crossnode level

  The 2025.03.17 Oracular kernels (including derivatives and backports)
  include commit b729cc1ec21a ("timers/migration: Fix another race
  between hotplug and idle entry/exit") via the upstream stable patchset
  LP: #2100328, while commit 868c9037df62 ("timers/migration: Fix off-
  by-one root mis-connection") is not included. I've verified locally
  that with the fix-the-fix commit 868c9037df62, the issue disappears.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2106022/+subscriptions



Follow ups