canonical-ubuntu-qa team mailing list archive
-
canonical-ubuntu-qa team
-
Mailing list archive
-
Message #05693
[Bug 2088047] Re: log_check / kernel_tainted test from ubuntu_boot failed on Oracular AWS a1.metal
** Description changed:
Found on Oracular/6.11.0-11.11 boot testing on AWS a1.metal instance.
The relevant console log excerpts:
-----(snip)-----
06:55:12 INFO | 2024-11-09T06:51:17.584884+00:00 ip-172-31-6-235 kernel: cpuinfo: failed to register hotplug callbacks.
-----(snip)-----
06:55:12 INFO | 2024-11-09T06:51:17.584978+00:00 ip-172-31-6-235 kernel: ------------[ cut here ]------------
06:55:12 INFO | 2024-11-09T06:51:17.584980+00:00 ip-172-31-6-235 kernel: WARNING: CPU: 7 PID: 1 at fs/sysfs/group.c:128 internal_create_group+0xc4/0x380
06:55:12 INFO | 2024-11-09T06:51:17.584981+00:00 ip-172-31-6-235 kernel: Modules linked in:
06:55:12 INFO | 2024-11-09T06:51:17.584983+00:00 ip-172-31-6-235 kernel: CPU: 7 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-11-generic #11-Ubuntu
06:55:12 INFO | 2024-11-09T06:51:17.584984+00:00 ip-172-31-6-235 kernel: Hardware name: Amazon EC2 a1.metal/Not Specified, BIOS 1.0 10/16/2017
06:55:12 INFO | 2024-11-09T06:51:17.584985+00:00 ip-172-31-6-235 kernel: pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
06:55:12 INFO | 2024-11-09T06:51:17.584987+00:00 ip-172-31-6-235 kernel: pc : internal_create_group+0xc4/0x380
06:55:12 INFO | 2024-11-09T06:51:17.584989+00:00 ip-172-31-6-235 kernel: lr : sysfs_create_group+0x24/0x50
06:55:12 INFO | 2024-11-09T06:51:17.584993+00:00 ip-172-31-6-235 kernel: sp : ffff80008009bb90
06:55:12 INFO | 2024-11-09T06:51:17.584995+00:00 ip-172-31-6-235 kernel: x29: ffff80008009bba0 x28: 0000000000000000 x27: ffff19093bd33ca8
06:55:12 INFO | 2024-11-09T06:51:17.584997+00:00 ip-172-31-6-235 kernel: x26: 0000000000000000 x25: ffff436d28704000 x24: ffffd59c11b04a88
06:55:12 INFO | 2024-11-09T06:51:17.584998+00:00 ip-172-31-6-235 kernel: x23: 0000000000000000 x22: ffffd59c14046768 x21: ffffd59c1362fca8
06:55:12 INFO | 2024-11-09T06:51:17.585000+00:00 ip-172-31-6-235 kernel: x20: 0000000000000036 x19: 0000000000000004 x18: ffff800080095060
06:55:12 INFO | 2024-11-09T06:51:17.585001+00:00 ip-172-31-6-235 kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
06:55:12 INFO | 2024-11-09T06:51:17.585003+00:00 ip-172-31-6-235 kernel: x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
06:55:12 INFO | 2024-11-09T06:51:17.585006+00:00 ip-172-31-6-235 kernel: x11: 0000000000000000 x10: 0000000000000000 x9 : ffffd59c1128fc4c
06:55:12 INFO | 2024-11-09T06:51:17.585008+00:00 ip-172-31-6-235 kernel: x8 : 0101010101010101 x7 : 0000000000000000 x6 : 0000000000000000
06:55:12 INFO | 2024-11-09T06:51:17.585010+00:00 ip-172-31-6-235 kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff1902003fa280
06:55:12 INFO | 2024-11-09T06:51:17.585011+00:00 ip-172-31-6-235 kernel: x2 : ffffd59c12648f88 x1 : 0000000000000000 x0 : 0000000000000000
06:55:12 INFO | 2024-11-09T06:51:17.585012+00:00 ip-172-31-6-235 kernel: Call trace:
06:55:12 INFO | 2024-11-09T06:51:17.585013+00:00 ip-172-31-6-235 kernel: internal_create_group+0xc4/0x380
06:55:12 INFO | 2024-11-09T06:51:17.585014+00:00 ip-172-31-6-235 kernel: sysfs_create_group+0x24/0x50
06:55:12 INFO | 2024-11-09T06:51:17.585015+00:00 ip-172-31-6-235 kernel: topology_add_dev+0x28/0x50
06:55:12 INFO | 2024-11-09T06:51:17.585016+00:00 ip-172-31-6-235 kernel: cpuhp_invoke_callback+0x200/0x780
06:55:12 INFO | 2024-11-09T06:51:17.585021+00:00 ip-172-31-6-235 kernel: cpuhp_issue_call+0x100/0x198
06:55:12 INFO | 2024-11-09T06:51:17.585023+00:00 ip-172-31-6-235 kernel: __cpuhp_setup_state_cpuslocked+0x128/0x330
06:55:12 INFO | 2024-11-09T06:51:17.585024+00:00 ip-172-31-6-235 kernel: __cpuhp_setup_state+0x5c/0xa8
06:55:12 INFO | 2024-11-09T06:51:17.585025+00:00 ip-172-31-6-235 kernel: topology_sysfs_init+0x40/0x78
06:55:12 INFO | 2024-11-09T06:51:17.585026+00:00 ip-172-31-6-235 kernel: do_one_initcall+0x64/0x3a0
06:55:12 INFO | 2024-11-09T06:51:17.585027+00:00 ip-172-31-6-235 kernel: do_initcalls+0x19c/0x210
06:55:12 INFO | 2024-11-09T06:51:17.585028+00:00 ip-172-31-6-235 kernel: kernel_init_freeable+0x18c/0x1e8
06:55:12 INFO | 2024-11-09T06:51:17.585029+00:00 ip-172-31-6-235 kernel: kernel_init+0x3c/0x190
06:55:12 INFO | 2024-11-09T06:51:17.585031+00:00 ip-172-31-6-235 kernel: ret_from_fork+0x10/0x20
06:55:12 INFO | 2024-11-09T06:51:17.585035+00:00 ip-172-31-6-235 kernel: ---[ end trace 0000000000000000 ]---
06:55:12 INFO | 2024-11-09T06:51:17.585037+00:00 ip-172-31-6-235 kernel: sysfs: cannot create duplicate filename '/devices/cache'
06:55:12 INFO | 2024-11-09T06:51:17.585038+00:00 ip-172-31-6-235 kernel: CPU: 5 UID: 0 PID: 47 Comm: cpuhp/5 Tainted: G W 6.11.0-11-generic #11-Ubuntu
06:55:12 INFO | 2024-11-09T06:51:17.585039+00:00 ip-172-31-6-235 kernel: Tainted: [W]=WARN
06:55:12 INFO | 2024-11-09T06:51:17.585040+00:00 ip-172-31-6-235 kernel: Hardware name: Amazon EC2 a1.metal/Not Specified, BIOS 1.0 10/16/2017
06:55:12 INFO | 2024-11-09T06:51:17.585041+00:00 ip-172-31-6-235 kernel: Call trace:
06:55:12 INFO | 2024-11-09T06:51:17.585146+00:00 ip-172-31-6-235 kernel: dump_backtrace+0x104/0x160
06:55:12 INFO | 2024-11-09T06:51:17.585149+00:00 ip-172-31-6-235 kernel: show_stack+0x24/0x50
06:55:12 INFO | 2024-11-09T06:51:17.585150+00:00 ip-172-31-6-235 kernel: dump_stack_lvl+0x84/0xc0
06:55:12 INFO | 2024-11-09T06:51:17.585155+00:00 ip-172-31-6-235 kernel: dump_stack+0x1c/0x40
06:55:12 INFO | 2024-11-09T06:51:17.585191+00:00 ip-172-31-6-235 kernel: sysfs_warn_dup+0xa8/0xf0
06:55:12 INFO | 2024-11-09T06:51:17.585193+00:00 ip-172-31-6-235 kernel: sysfs_create_dir_ns+0x124/0x150
06:55:12 INFO | 2024-11-09T06:51:17.585194+00:00 ip-172-31-6-235 kernel: create_dir+0x30/0x120
06:55:12 INFO | 2024-11-09T06:51:17.585215+00:00 ip-172-31-6-235 kernel: kobject_add_internal+0x90/0x240
06:55:12 INFO | 2024-11-09T06:51:17.585218+00:00 ip-172-31-6-235 kernel: kobject_add+0xa0/0x140
06:55:12 INFO | 2024-11-09T06:51:17.585234+00:00 ip-172-31-6-235 kernel: device_add+0xd8/0x748
06:55:12 INFO | 2024-11-09T06:51:17.585236+00:00 ip-172-31-6-235 kernel: cpu_device_create+0x19c/0x1c0
06:55:12 INFO | 2024-11-09T06:51:17.585238+00:00 ip-172-31-6-235 kernel: cache_add_dev+0x84/0x428
06:55:12 INFO | 2024-11-09T06:51:17.585252+00:00 ip-172-31-6-235 kernel: cacheinfo_cpu_online+0x90/0x138
06:55:12 INFO | 2024-11-09T06:51:17.585254+00:00 ip-172-31-6-235 kernel: cpuhp_invoke_callback+0x200/0x780
06:55:12 INFO | 2024-11-09T06:51:17.585256+00:00 ip-172-31-6-235 kernel: cpuhp_thread_fun+0x140/0x358
06:55:12 INFO | 2024-11-09T06:51:17.585281+00:00 ip-172-31-6-235 kernel: smpboot_thread_fn+0x224/0x250
06:55:12 INFO | 2024-11-09T06:51:17.585287+00:00 ip-172-31-6-235 kernel: kthread+0xf4/0x108
06:55:12 INFO | 2024-11-09T06:51:17.585289+00:00 ip-172-31-6-235 kernel: ret_from_fork+0x10/0x20
06:55:12 INFO | 2024-11-09T06:51:17.585299+00:00 ip-172-31-6-235 kernel: kobject: kobject_add_internal failed for cache with -EEXIST, don't try to register things with the same name in the same directory.
This also was observed on 6.11.0-1004-aws and 6.11.0-1005-aws.
Note that Noble is not affected. See [Affected versions] section for more details.
-------------------------------------
[Summary]
- This is not a regression but caused by problematic ACPI table on a1.metal.
- If ACPI table won't be fixed soon, it might be an option to add a workaround at least in our tree. Please see more details in section [Solution]
[Cause]
According to the warn messages, the following two are failing:
* cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "arm64/cpuinfo:online",
cpuid_cpu_online, cpuid_cpu_offline)
* cpuhp_setup_state(CPUHP_AP_BASE_CACHEINFO_ONLINE, "base/cacheinfo:online",
cacheinfo_cpu_online, cacheinfo_cpu_pre_down)
Note that there are other cpuhp callbacks that are failing. Boot-time
tracing of cpuhp:* events reveals it:
4) | /* cpuhp_enter: cpu: 0004 target: 238 step: 199 (cpu_capacity_sysctl_add) */
4) | /* cpuhp_exit: cpu: 0004 state: 238 step: 199 ret: -2 */
4) | /* cpuhp_enter: cpu: 0004 target: 238 step: 199 (cpuid_cpu_online) */
4) | /* cpuhp_exit: cpu: 0004 state: 238 step: 199 ret: -19 */
5) | /* cpuhp_enter: cpu: 0004 target: 238 step: 54 (topology_add_dev) */
5) | /* cpuhp_exit: cpu: 0004 state: 238 step: 54 ret: -22 */
5) | /* cpuhp_enter: cpu: 0005 target: 238 step: 193 (cacheinfo_cpu_online) */
5) | /* cpuhp_exit: cpu: 0005 state: 238 step: 193 ret: -17 */
These failures are due to non-enabled CPU#4-15 despite that they are in cpu_possible_mask and also online.
The issue is that acpi_get_phys_id() fails to get phys_id for processor devices (CPU#4-15) because of
discrepancies in ACPI table.
-> acpi_processor_get_info
-> acpi_get_phys_id
-> map_mat_entry
-> map_madt_entry
Processor Device _UIDs are sequential numbers starting from 0, while Processor UIDs in MADT/PPTT
are non-sequential (0x0, 0x1, 0x2, 0x3, 0x100, 0x101, 0x102, 0x103, 0x200, 0x201, ...).
This results in the map_madt_entry() failure for CPU#4-15.
[Affected Versions]
* All Oracular kernels are affected at the moment.
* All Noble kernels are not affected at the moment.
This is because only Oracular set CONFIG_ACPI_HOTPLUG_CPU=y because of the two upstream commits:
9d0873892f4d ("arm64: Kconfig: Enable hotplug CPU on arm64 if ACPI_PROCESSOR is enabled.")
46800e38ef0e ("arm64: Kconfig: Fix dependencies to enable ACPI_HOTPLUG_CPU")
which are originally included in its master kernel.
[Solution]
+ There are some options:
+
(a). override ACPI table (while waiting for firmware update)
(b). apply a workaround patch for o:aws (and for other series that is supposed to run on a1.metal)
+ (c). set CONFIG_HOTPLUG_CPU=n, which leads to CONFIG_ACPI_HOTPLUG_CPU=n
[Experiment]
Regarding (b), I cooked up a workaround patch (dirty hack), and confirmed that acpi_processor_get_info()
turns to succeed for all CPU#4-15 and the warn messages disappeared. See the attached.
--
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2088047
Title:
log_check / kernel_tainted test from ubuntu_boot failed on Oracular
AWS a1.metal
Status in ubuntu-kernel-tests:
New
Bug description:
Found on Oracular/6.11.0-11.11 boot testing on AWS a1.metal instance.
The relevant console log excerpts:
-----(snip)-----
06:55:12 INFO | 2024-11-09T06:51:17.584884+00:00 ip-172-31-6-235 kernel: cpuinfo: failed to register hotplug callbacks.
-----(snip)-----
06:55:12 INFO | 2024-11-09T06:51:17.584978+00:00 ip-172-31-6-235 kernel: ------------[ cut here ]------------
06:55:12 INFO | 2024-11-09T06:51:17.584980+00:00 ip-172-31-6-235 kernel: WARNING: CPU: 7 PID: 1 at fs/sysfs/group.c:128 internal_create_group+0xc4/0x380
06:55:12 INFO | 2024-11-09T06:51:17.584981+00:00 ip-172-31-6-235 kernel: Modules linked in:
06:55:12 INFO | 2024-11-09T06:51:17.584983+00:00 ip-172-31-6-235 kernel: CPU: 7 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-11-generic #11-Ubuntu
06:55:12 INFO | 2024-11-09T06:51:17.584984+00:00 ip-172-31-6-235 kernel: Hardware name: Amazon EC2 a1.metal/Not Specified, BIOS 1.0 10/16/2017
06:55:12 INFO | 2024-11-09T06:51:17.584985+00:00 ip-172-31-6-235 kernel: pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
06:55:12 INFO | 2024-11-09T06:51:17.584987+00:00 ip-172-31-6-235 kernel: pc : internal_create_group+0xc4/0x380
06:55:12 INFO | 2024-11-09T06:51:17.584989+00:00 ip-172-31-6-235 kernel: lr : sysfs_create_group+0x24/0x50
06:55:12 INFO | 2024-11-09T06:51:17.584993+00:00 ip-172-31-6-235 kernel: sp : ffff80008009bb90
06:55:12 INFO | 2024-11-09T06:51:17.584995+00:00 ip-172-31-6-235 kernel: x29: ffff80008009bba0 x28: 0000000000000000 x27: ffff19093bd33ca8
06:55:12 INFO | 2024-11-09T06:51:17.584997+00:00 ip-172-31-6-235 kernel: x26: 0000000000000000 x25: ffff436d28704000 x24: ffffd59c11b04a88
06:55:12 INFO | 2024-11-09T06:51:17.584998+00:00 ip-172-31-6-235 kernel: x23: 0000000000000000 x22: ffffd59c14046768 x21: ffffd59c1362fca8
06:55:12 INFO | 2024-11-09T06:51:17.585000+00:00 ip-172-31-6-235 kernel: x20: 0000000000000036 x19: 0000000000000004 x18: ffff800080095060
06:55:12 INFO | 2024-11-09T06:51:17.585001+00:00 ip-172-31-6-235 kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
06:55:12 INFO | 2024-11-09T06:51:17.585003+00:00 ip-172-31-6-235 kernel: x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
06:55:12 INFO | 2024-11-09T06:51:17.585006+00:00 ip-172-31-6-235 kernel: x11: 0000000000000000 x10: 0000000000000000 x9 : ffffd59c1128fc4c
06:55:12 INFO | 2024-11-09T06:51:17.585008+00:00 ip-172-31-6-235 kernel: x8 : 0101010101010101 x7 : 0000000000000000 x6 : 0000000000000000
06:55:12 INFO | 2024-11-09T06:51:17.585010+00:00 ip-172-31-6-235 kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff1902003fa280
06:55:12 INFO | 2024-11-09T06:51:17.585011+00:00 ip-172-31-6-235 kernel: x2 : ffffd59c12648f88 x1 : 0000000000000000 x0 : 0000000000000000
06:55:12 INFO | 2024-11-09T06:51:17.585012+00:00 ip-172-31-6-235 kernel: Call trace:
06:55:12 INFO | 2024-11-09T06:51:17.585013+00:00 ip-172-31-6-235 kernel: internal_create_group+0xc4/0x380
06:55:12 INFO | 2024-11-09T06:51:17.585014+00:00 ip-172-31-6-235 kernel: sysfs_create_group+0x24/0x50
06:55:12 INFO | 2024-11-09T06:51:17.585015+00:00 ip-172-31-6-235 kernel: topology_add_dev+0x28/0x50
06:55:12 INFO | 2024-11-09T06:51:17.585016+00:00 ip-172-31-6-235 kernel: cpuhp_invoke_callback+0x200/0x780
06:55:12 INFO | 2024-11-09T06:51:17.585021+00:00 ip-172-31-6-235 kernel: cpuhp_issue_call+0x100/0x198
06:55:12 INFO | 2024-11-09T06:51:17.585023+00:00 ip-172-31-6-235 kernel: __cpuhp_setup_state_cpuslocked+0x128/0x330
06:55:12 INFO | 2024-11-09T06:51:17.585024+00:00 ip-172-31-6-235 kernel: __cpuhp_setup_state+0x5c/0xa8
06:55:12 INFO | 2024-11-09T06:51:17.585025+00:00 ip-172-31-6-235 kernel: topology_sysfs_init+0x40/0x78
06:55:12 INFO | 2024-11-09T06:51:17.585026+00:00 ip-172-31-6-235 kernel: do_one_initcall+0x64/0x3a0
06:55:12 INFO | 2024-11-09T06:51:17.585027+00:00 ip-172-31-6-235 kernel: do_initcalls+0x19c/0x210
06:55:12 INFO | 2024-11-09T06:51:17.585028+00:00 ip-172-31-6-235 kernel: kernel_init_freeable+0x18c/0x1e8
06:55:12 INFO | 2024-11-09T06:51:17.585029+00:00 ip-172-31-6-235 kernel: kernel_init+0x3c/0x190
06:55:12 INFO | 2024-11-09T06:51:17.585031+00:00 ip-172-31-6-235 kernel: ret_from_fork+0x10/0x20
06:55:12 INFO | 2024-11-09T06:51:17.585035+00:00 ip-172-31-6-235 kernel: ---[ end trace 0000000000000000 ]---
06:55:12 INFO | 2024-11-09T06:51:17.585037+00:00 ip-172-31-6-235 kernel: sysfs: cannot create duplicate filename '/devices/cache'
06:55:12 INFO | 2024-11-09T06:51:17.585038+00:00 ip-172-31-6-235 kernel: CPU: 5 UID: 0 PID: 47 Comm: cpuhp/5 Tainted: G W 6.11.0-11-generic #11-Ubuntu
06:55:12 INFO | 2024-11-09T06:51:17.585039+00:00 ip-172-31-6-235 kernel: Tainted: [W]=WARN
06:55:12 INFO | 2024-11-09T06:51:17.585040+00:00 ip-172-31-6-235 kernel: Hardware name: Amazon EC2 a1.metal/Not Specified, BIOS 1.0 10/16/2017
06:55:12 INFO | 2024-11-09T06:51:17.585041+00:00 ip-172-31-6-235 kernel: Call trace:
06:55:12 INFO | 2024-11-09T06:51:17.585146+00:00 ip-172-31-6-235 kernel: dump_backtrace+0x104/0x160
06:55:12 INFO | 2024-11-09T06:51:17.585149+00:00 ip-172-31-6-235 kernel: show_stack+0x24/0x50
06:55:12 INFO | 2024-11-09T06:51:17.585150+00:00 ip-172-31-6-235 kernel: dump_stack_lvl+0x84/0xc0
06:55:12 INFO | 2024-11-09T06:51:17.585155+00:00 ip-172-31-6-235 kernel: dump_stack+0x1c/0x40
06:55:12 INFO | 2024-11-09T06:51:17.585191+00:00 ip-172-31-6-235 kernel: sysfs_warn_dup+0xa8/0xf0
06:55:12 INFO | 2024-11-09T06:51:17.585193+00:00 ip-172-31-6-235 kernel: sysfs_create_dir_ns+0x124/0x150
06:55:12 INFO | 2024-11-09T06:51:17.585194+00:00 ip-172-31-6-235 kernel: create_dir+0x30/0x120
06:55:12 INFO | 2024-11-09T06:51:17.585215+00:00 ip-172-31-6-235 kernel: kobject_add_internal+0x90/0x240
06:55:12 INFO | 2024-11-09T06:51:17.585218+00:00 ip-172-31-6-235 kernel: kobject_add+0xa0/0x140
06:55:12 INFO | 2024-11-09T06:51:17.585234+00:00 ip-172-31-6-235 kernel: device_add+0xd8/0x748
06:55:12 INFO | 2024-11-09T06:51:17.585236+00:00 ip-172-31-6-235 kernel: cpu_device_create+0x19c/0x1c0
06:55:12 INFO | 2024-11-09T06:51:17.585238+00:00 ip-172-31-6-235 kernel: cache_add_dev+0x84/0x428
06:55:12 INFO | 2024-11-09T06:51:17.585252+00:00 ip-172-31-6-235 kernel: cacheinfo_cpu_online+0x90/0x138
06:55:12 INFO | 2024-11-09T06:51:17.585254+00:00 ip-172-31-6-235 kernel: cpuhp_invoke_callback+0x200/0x780
06:55:12 INFO | 2024-11-09T06:51:17.585256+00:00 ip-172-31-6-235 kernel: cpuhp_thread_fun+0x140/0x358
06:55:12 INFO | 2024-11-09T06:51:17.585281+00:00 ip-172-31-6-235 kernel: smpboot_thread_fn+0x224/0x250
06:55:12 INFO | 2024-11-09T06:51:17.585287+00:00 ip-172-31-6-235 kernel: kthread+0xf4/0x108
06:55:12 INFO | 2024-11-09T06:51:17.585289+00:00 ip-172-31-6-235 kernel: ret_from_fork+0x10/0x20
06:55:12 INFO | 2024-11-09T06:51:17.585299+00:00 ip-172-31-6-235 kernel: kobject: kobject_add_internal failed for cache with -EEXIST, don't try to register things with the same name in the same directory.
This also was observed on 6.11.0-1004-aws and 6.11.0-1005-aws.
Note that Noble is not affected. See [Affected versions] section for more details.
-------------------------------------
[Summary]
- This is not a regression but caused by problematic ACPI table on a1.metal.
- If ACPI table won't be fixed soon, it might be an option to add a workaround at least in our tree. Please see more details in section [Solution]
[Cause]
According to the warn messages, the following two are failing:
* cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "arm64/cpuinfo:online",
cpuid_cpu_online, cpuid_cpu_offline)
* cpuhp_setup_state(CPUHP_AP_BASE_CACHEINFO_ONLINE, "base/cacheinfo:online",
cacheinfo_cpu_online, cacheinfo_cpu_pre_down)
Note that there are other cpuhp callbacks that are failing. Boot-
time tracing of cpuhp:* events reveals it:
4) | /* cpuhp_enter: cpu: 0004 target: 238 step: 199 (cpu_capacity_sysctl_add) */
4) | /* cpuhp_exit: cpu: 0004 state: 238 step: 199 ret: -2 */
4) | /* cpuhp_enter: cpu: 0004 target: 238 step: 199 (cpuid_cpu_online) */
4) | /* cpuhp_exit: cpu: 0004 state: 238 step: 199 ret: -19 */
5) | /* cpuhp_enter: cpu: 0004 target: 238 step: 54 (topology_add_dev) */
5) | /* cpuhp_exit: cpu: 0004 state: 238 step: 54 ret: -22 */
5) | /* cpuhp_enter: cpu: 0005 target: 238 step: 193 (cacheinfo_cpu_online) */
5) | /* cpuhp_exit: cpu: 0005 state: 238 step: 193 ret: -17 */
These failures are due to non-enabled CPU#4-15 despite that they are in cpu_possible_mask and also online.
The issue is that acpi_get_phys_id() fails to get phys_id for processor devices (CPU#4-15) because of
discrepancies in ACPI table.
-> acpi_processor_get_info
-> acpi_get_phys_id
-> map_mat_entry
-> map_madt_entry
Processor Device _UIDs are sequential numbers starting from 0, while Processor UIDs in MADT/PPTT
are non-sequential (0x0, 0x1, 0x2, 0x3, 0x100, 0x101, 0x102, 0x103, 0x200, 0x201, ...).
This results in the map_madt_entry() failure for CPU#4-15.
[Affected Versions]
* All Oracular kernels are affected at the moment.
* All Noble kernels are not affected at the moment.
This is because only Oracular set CONFIG_ACPI_HOTPLUG_CPU=y because of the two upstream commits:
9d0873892f4d ("arm64: Kconfig: Enable hotplug CPU on arm64 if ACPI_PROCESSOR is enabled.")
46800e38ef0e ("arm64: Kconfig: Fix dependencies to enable ACPI_HOTPLUG_CPU")
which are originally included in its master kernel.
[Solution]
There are some options:
(a). override ACPI table (while waiting for firmware update)
(b). apply a workaround patch for o:aws (and for other series that is supposed to run on a1.metal)
(c). set CONFIG_HOTPLUG_CPU=n, which leads to CONFIG_ACPI_HOTPLUG_CPU=n
[Experiment]
Regarding (b), I cooked up a workaround patch (dirty hack), and confirmed that acpi_processor_get_info()
turns to succeed for all CPU#4-15 and the warn messages disappeared. See the attached.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2088047/+subscriptions
References