← Back to team overview

canonical-ubuntu-qa team mailing list archive

[Bug 2106022] Re: log_check/kernel_tainted failed with kernel warnings at kernel/time/timer_migration.c:543 on Oracular

 

This bug is awaiting verification that the linux/6.11.0-25.25 kernel in
-proposed solves the problem. Please test the kernel and update this bug
with the results. If the problem is solved, change the tag
'verification-needed-oracular-linux' to 'verification-done-oracular-
linux'. If the problem still exists, change the tag 'verification-
needed-oracular-linux' to 'verification-failed-oracular-linux'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-oracular-linux-v2 verification-needed-oracular-linux

-- 
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2106022

Title:
  log_check/kernel_tainted failed with kernel warnings at
  kernel/time/timer_migration.c:543 on Oracular

Status in ubuntu-kernel-tests:
  New

Bug description:
  Found during boot testing of Noble linux-lowlatency-hwe-6.11
  (6.11.0-1012.13~24.04.1) on TF amd-server.

  Sample kernel warning message:

      WARNING: CPU: 0 PID: 1 at kernel/time/timer_migration.c:543 tmigr_requires_handle_remote+0x123/0x130
      Modules linked in:
      CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-1012-lowlatency #13~24.04.1-Ubuntu
      Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/07/2018
      RIP: 0010:tmigr_requires_handle_remote+0x123/0x130
      Code: 65 48 2b 14 25 28 00 00 00 75 23 48 83 c4 10 5b 41 5c 41 5d 41 5e 41
      5f 5d 31 d2 31 c9 31 f6 31 ff e9 c1 84 07 01 0f 0b eb ba <0f> 0b eb a9 e8 44
      5d 06 01 0f 1f 40 00 90 90 90 90 90 90 90 90 90
      RSP: 0018:ffffa6f9c0003f30 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: ffff8c899f026200 RCX: 7fffffffffffffff
      RDX: ffff8c8240100e00 RSI: 0000000000000002 RDI: 0000000000000000
      RBP: ffffa6f9c0003f68 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
      FS:  0000000000000000(0000) GS:ffff8c899f000000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff8cc1bfdff000 CR3: 0000002dcb83e000 CR4: 00000000003506f0
      Call Trace:

       ? show_regs+0x6c/0x80
       ? __warn+0x88/0x140
       ? tmigr_requires_handle_remote+0x123/0x130
       ? report_bug+0x182/0x1b0
       ? handle_bug+0x6e/0xb0
       ? exc_invalid_op+0x18/0x80
       ? asm_exc_invalid_op+0x1b/0x20
       ? tmigr_requires_handle_remote+0x123/0x130
       update_process_times+0x63/0xb0
       tick_periodic+0x2d/0x90
       tick_handle_periodic+0x25/0x80
       __sysvec_apic_timer_interrupt+0x59/0x130
       sysvec_apic_timer_interrupt+0x9b/0xc0
       asm_sysvec_apic_timer_interrupt+0x1b/0x20
      RIP: 0010:delay_halt_mwaitx+0x3c/0x50
      Code: 05 91 3f 60 64 48 05 00 60 00 00 0f 01 fa b8 ff ff ff ff b9 02 00 00
      00 48 39 c6 48 0f 46 c6 48 89 c3 b8 f0 00 00 00 0f 01 fb <48> 8b 5d f8 c9 31
      c0 31 d2 31 c9 31 f6 e9 22 53 09 00 66 90 90 90
      RSP: 0018:ffffa6f9c007bbf8 EFLAGS: 00000293
      RAX: 00000000000000f0 RBX: 0000000000005d93 RCX: 0000000000000002
      RDX: 0000000000000000 RSI: 0000000000005d93 RDI: 00000035e3527498
      RBP: ffffa6f9c007bc00 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000005d93
      R13: 0000000000000005 R14: 0000000000000001 R15: 0000000000000020
       ? srso_return_thunk+0x5/0x5f
       delay_halt.part.0+0x3e/0x70
       delay_halt+0x13/0x30
       __const_udelay+0x3d/0x50
       wakeup_secondary_cpu_via_init+0xed/0x2e0
       do_boot_cpu+0x1d1/0x200
       native_kick_ap+0x111/0x1d0
       arch_cpuhp_kick_ap_alive+0x15/0x20
       cpuhp_kick_ap_alive+0x55/0x90
       ? __pfx_cpuhp_kick_ap_alive+0x10/0x10
       cpuhp_invoke_callback+0x340/0x520
       __cpuhp_invoke_callback_range+0x80/0x100
       _cpu_up+0x10b/0x280
       cpu_up+0xe3/0x120
       cpuhp_bringup_mask+0x71/0xd0
       cpuhp_bringup_cpus_parallel+0x116/0x150
       ? __pfx_kernel_init+0x10/0x10
       bringup_nonboot_cpus+0x22/0x50
       smp_init+0x2a/0x90
       kernel_init_freeable+0x10b/0x210
       kernel_init+0x1b/0x200
       ret_from_fork+0x47/0x70
       ? __pfx_kernel_init+0x10/0x10
       ret_from_fork_asm+0x1a/0x30

      ---[ end trace 0000000000000000 ]---

  This issue can be reproduced with oracular/linux, at least with the
  same tmigr_group hierarchy, so it is likely to be observed on any
  Oracular derivatives or backports. The kernel logs related to the
  topology of TF amd-server (and eventual group hierarchy), where the
  issue was observed, are as follows:

      CPU topo: Max. logical packages:   2
      CPU topo: Max. logical dies:       2
      CPU topo: Max. dies per package:   1
      CPU topo: Max. threads per core:   2
      CPU topo: Num. cores per package:    16
      CPU topo: Num. threads per package:  32
      CPU topo: Allowing 64 present CPUs plus 0 hotplug CPUs

      smpboot: x86: Booting SMP configuration:
      .... node  #0, CPUs:        #1  #2  #3
      .... node  #1, CPUs:    #4  #5  #6  #7
      .... node  #2, CPUs:    #8  #9 #10 #11
      .... node  #3, CPUs:   #12 #13 #14 #15
      .... node  #4, CPUs:   #16 #17 #18 #19
      .... node  #5, CPUs:   #20 #21 #22 #23
      .... node  #6, CPUs:   #24 #25 #26 #27
      .... node  #7, CPUs:   #28 #29 #30 #31
      .... node  #0, CPUs:   #32 #33 #34 #35
      .... node  #1, CPUs:   #36 #37 #38 #39
      .... node  #2, CPUs:   #40 #41 #42 #43
      .... node  #3, CPUs:   #44 #45 #46 #47
      .... node  #4, CPUs:   #48 #49 #50 #51
      .... node  #5, CPUs:   #52 #53 #54 #55
      .... node  #6, CPUs:   #56 #57 #58 #59
      .... node  #7, CPUs:   #60 #61 #62 #63

      Timer migration: 2 hierarchy levels; 8 children per group; 1
  crossnode level

  The 2025.03.17 Oracular kernels (including derivatives and backports)
  include commit b729cc1ec21a ("timers/migration: Fix another race
  between hotplug and idle entry/exit") via the upstream stable patchset
  LP: #2100328, while commit 868c9037df62 ("timers/migration: Fix off-
  by-one root mis-connection") is not included. I've verified locally
  that with the fix-the-fix commit 868c9037df62, the issue disappears.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2106022/+subscriptions



References