kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #98118
[Bug 1402141] Re: Ubuntu 14.10 freezes when use smt-enabled=off as kernel argument
** Description changed:
SRU Justification:
- Impact: Booting a POWER8 machine with smt-enabled=off will cause a system to hang at "Freeing initrd memory"
+ Impact: Booting a POWER8 machine with smt-enabled=off will cause a system to hang at "Freeing initrd memory", note is only affects kernel with powernv split-core support.
Fix: commit d70a54e2d08510a99b1f10eceeae6f2f7086e226 upstream
Testcase: Boot with smt-enabled=off on a POWER8 machine
--
== Comment: #0 - Paulo Flabiano Smorigo <pfsmorigo@xxxxxxxxxx> - 2014-11-18 12:28:42 ==
Using Ubuntu as the host, if you add smt-enabled=off as kernel argument, the system will boot until the "Freeing initrd memory" line:
...
[ 1.371729] vgaarb: loaded
[ 1.372989] SCSI subsystem initialized
[ 1.373977] libata version 3.00 loaded.
[ 1.374158] usbcore: registered new interface driver usbfs
[ 1.374246] usbcore: registered new interface driver hub
[ 1.374382] usbcore: registered new device driver usb
[ 1.374505] pps_core: LinuxPPS API ver. 1 registered
[ 1.374563] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@xxxxxxxx>
[ 1.374671] PTP clock support registered
[ 1.377135] NetLabel: Initializing
[ 1.377218] NetLabel: domain hash size = 128
[ 1.377328] NetLabel: protocols = UNLABELED CIPSOv4
[ 1.377472] NetLabel: unlabeled traffic allowed by default
[ 1.377983] Switched to clocksource timebase
[ 1.395029] AppArmor: AppArmor Filesystem Enabled
[ 1.402044] NET: Registered protocol family 2
[ 1.403795] TCP established hash table entries: 524288 (order: 6, 4194304 bytes)
[ 1.408343] TCP bind hash table entries: 65536 (order: 4, 1048576 bytes)
[ 1.409301] TCP: Hash tables configured (established 524288 bind 65536)
[ 1.409490] TCP: reno registered
[ 1.409645] UDP hash table entries: 65536 (order: 5, 2097152 bytes)
[ 1.411943] UDP-Lite hash table entries: 65536 (order: 5, 2097152 bytes)
[ 1.415409] NET: Registered protocol family 1
[ 1.415753] PCI: CLS 128 bytes, default 128
[ 1.415962] Trying to unpack rootfs image as initramfs...
[ 2.250464] Freeing initrd memory: 21952K (c000000003820000 - c000000004d90000)
Machine Type = Power 8 (S822L)
== Comment: #1 - Thadeu Lima De Souza Cascardo <thadeul@xxxxxxxxxx> - 2014-11-18 13:42:37 ==
What is the firmware version?
Cascardo.
== Comment: #2 - Paulo Flabiano Smorigo <pfsmorigo@xxxxxxxxxx> - 2014-11-19 07:13:35 ==
Currently is FW810.02 (SV810_061). Will update it today.
Smorigo.
== Comment: #3 - Paulo Flabiano Smorigo <pfsmorigo@xxxxxxxxxx> - 2014-11-19 12:47:24 ==
Updated to FW810.20 (SV810_101). Nothing changed.
== Comment: #4 - Greg Kurz <KURZGREG@xxxxxxxxxx> - 2014-11-20 05:47:45 ==
I reproduce it on a s824 with the FW810.20 (TV810_101) firmware, running 14.04.2 "alpha" (kernel 3.16.0-25). The issue doesn't show up with kernel 3.13.0-39. I shall try mainline and do some bisect.
== Comment: #5 - Greg Kurz <KURZGREG@xxxxxxxxxx> - 2014-11-20 13:31:03 ==
FYI issue is upstream.
== Comment: #6 - Breno Henrique Leitao <brenohl@xxxxxxxxxx> - 2014-11-24 11:23:04 ==
(In reply to comment #5)
> FYI issue is upstream.
Greg, are you working to solve this issue?
== Comment: #7 - Greg Kurz <KURZGREG@xxxxxxxxxx> - 2014-11-24 12:08:33 ==
(In reply to comment #6)
> (In reply to comment #5)
> > FYI issue is upstream.
>
> Greg, are you working to solve this issue?
Yes I am.
== Comment: #8 - Greg Kurz <KURZGREG@xxxxxxxxxx> - 2014-12-01 04:56:07 ==
The hang occurs because all running threads are looping in the split core code:
static void wait_for_sync_step(int step)
{
int i, cpu = smp_processor_id();
for (i = cpu + 1; i < cpu + threads_per_core; i++)
> while(per_cpu(split_state, i).step < step)
> barrier();
The problem is that the split core code needs all possible threads to
participate... if the kernel is booted with smt-enabled set to something
different than the maximum value, some threads are missing and this
ruins the sync.
== Comment: #9 - Greg Kurz <KURZGREG@xxxxxxxxxx> - 2014-12-01 05:24:28 ==
The current implementaqtion for smt-enabled= is a hack: it simply leaves hw threads looping where they happen to be (firmware probably)... This isn't acceptable in a production environment.
An "acceptable" fix would be to start all threads anyway and offline the
ones that need to be to honour the requested SMT mode AFTER subcores
init. This requires a non-trivial patch.
Since changing SMT mode from userspace when the system is booted is
really straightforward, Michael Ellerman suggests we simply drop that
smt-enabled= feature.
Smorigo,
Why were you using smt-enabled= ? Is there a reason not to do it after the system is booted with
ppc64_cpu --smt or writing directly to /sys/devices/system/cpu/cpu*/online ?
== Comment: #10 - Paulo Flabiano Smorigo <pfsmorigo@xxxxxxxxxx> - 2014-12-01 06:23:34 ==
I used smt-enabled= because for me was the easier way to disable it. Like, add this parameter in GRUB_CMDLINE_LINUX and done. :)
I'll check if there is a problem to drop it.
== Comment: #11 - Paulo Flabiano Smorigo <pfsmorigo@xxxxxxxxxx> - 2014-12-01 08:30:55 ==
Greg, are you saying to dropping it for good? Maybe we can add that as a feature request for next year. Btw, I'm ok with drop it for now.
== Comment: #12 - Greg Kurz <KURZGREG@xxxxxxxxxx> - 2014-12-01 09:30:00 ==
(In reply to comment #11)
> Greg, are you saying to dropping it for good? Maybe we can add that as a
> feature request for next year. Btw, I'm ok with drop it for now.
Yes, drop it for good as suggested by Michael Ellerman...
<mpe> groug: that smt-enabled stuff is just a hack. It leaves the cpu executing wherever it happens to be, possibly in firmware, possibly busy looping somewhere, it's really no good for use in production
<mpe> the only way we could make it usable I think is to have the cpu come up, and then we offline it
<mpe> but I'm really inclined to say that should just be done in userspace
<groug> mpe, yeah... I had thought of something similar (starting and then offlining) but I agree it should be handled from userspace
<mpe> I'll talk to benh and anton about it tomorrow, but I think we just rip it out
The point is that it is already extremely easy to change SMT mode from
an init script and you get the same result... compared to the hassle of
doing it in the kernel without breaking things. Not even worth a feature
request I would say.
== Comment: #13 - Greg Kurz <KURZGREG@xxxxxxxxxx> - 2014-12-12 08:50:25 ==
I've sent a patch:
powerpc/powernv: force all CPUs to be bootable
http://patchwork.ozlabs.org/patch/420440/
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1402141
Title:
Ubuntu 14.10 freezes when use smt-enabled=off as kernel argument
Status in linux package in Ubuntu:
Triaged
Status in linux source package in Utopic:
In Progress
Bug description:
SRU Justification:
Impact: Booting a POWER8 machine with smt-enabled=off will cause a system to hang at "Freeing initrd memory", note is only affects kernel with powernv split-core support.
Fix: commit d70a54e2d08510a99b1f10eceeae6f2f7086e226 upstream
Testcase: Boot with smt-enabled=off on a POWER8 machine
--
== Comment: #0 - Paulo Flabiano Smorigo <pfsmorigo@xxxxxxxxxx> - 2014-11-18 12:28:42 ==
Using Ubuntu as the host, if you add smt-enabled=off as kernel argument, the system will boot until the "Freeing initrd memory" line:
...
[ 1.371729] vgaarb: loaded
[ 1.372989] SCSI subsystem initialized
[ 1.373977] libata version 3.00 loaded.
[ 1.374158] usbcore: registered new interface driver usbfs
[ 1.374246] usbcore: registered new interface driver hub
[ 1.374382] usbcore: registered new device driver usb
[ 1.374505] pps_core: LinuxPPS API ver. 1 registered
[ 1.374563] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@xxxxxxxx>
[ 1.374671] PTP clock support registered
[ 1.377135] NetLabel: Initializing
[ 1.377218] NetLabel: domain hash size = 128
[ 1.377328] NetLabel: protocols = UNLABELED CIPSOv4
[ 1.377472] NetLabel: unlabeled traffic allowed by default
[ 1.377983] Switched to clocksource timebase
[ 1.395029] AppArmor: AppArmor Filesystem Enabled
[ 1.402044] NET: Registered protocol family 2
[ 1.403795] TCP established hash table entries: 524288 (order: 6, 4194304 bytes)
[ 1.408343] TCP bind hash table entries: 65536 (order: 4, 1048576 bytes)
[ 1.409301] TCP: Hash tables configured (established 524288 bind 65536)
[ 1.409490] TCP: reno registered
[ 1.409645] UDP hash table entries: 65536 (order: 5, 2097152 bytes)
[ 1.411943] UDP-Lite hash table entries: 65536 (order: 5, 2097152 bytes)
[ 1.415409] NET: Registered protocol family 1
[ 1.415753] PCI: CLS 128 bytes, default 128
[ 1.415962] Trying to unpack rootfs image as initramfs...
[ 2.250464] Freeing initrd memory: 21952K (c000000003820000 - c000000004d90000)
Machine Type = Power 8 (S822L)
== Comment: #1 - Thadeu Lima De Souza Cascardo <thadeul@xxxxxxxxxx> - 2014-11-18 13:42:37 ==
What is the firmware version?
Cascardo.
== Comment: #2 - Paulo Flabiano Smorigo <pfsmorigo@xxxxxxxxxx> - 2014-11-19 07:13:35 ==
Currently is FW810.02 (SV810_061). Will update it today.
Smorigo.
== Comment: #3 - Paulo Flabiano Smorigo <pfsmorigo@xxxxxxxxxx> - 2014-11-19 12:47:24 ==
Updated to FW810.20 (SV810_101). Nothing changed.
== Comment: #4 - Greg Kurz <KURZGREG@xxxxxxxxxx> - 2014-11-20 05:47:45 ==
I reproduce it on a s824 with the FW810.20 (TV810_101) firmware, running 14.04.2 "alpha" (kernel 3.16.0-25). The issue doesn't show up with kernel 3.13.0-39. I shall try mainline and do some bisect.
== Comment: #5 - Greg Kurz <KURZGREG@xxxxxxxxxx> - 2014-11-20 13:31:03 ==
FYI issue is upstream.
== Comment: #6 - Breno Henrique Leitao <brenohl@xxxxxxxxxx> - 2014-11-24 11:23:04 ==
(In reply to comment #5)
> FYI issue is upstream.
Greg, are you working to solve this issue?
== Comment: #7 - Greg Kurz <KURZGREG@xxxxxxxxxx> - 2014-11-24 12:08:33 ==
(In reply to comment #6)
> (In reply to comment #5)
> > FYI issue is upstream.
>
> Greg, are you working to solve this issue?
Yes I am.
== Comment: #8 - Greg Kurz <KURZGREG@xxxxxxxxxx> - 2014-12-01 04:56:07 ==
The hang occurs because all running threads are looping in the split core code:
static void wait_for_sync_step(int step)
{
int i, cpu = smp_processor_id();
for (i = cpu + 1; i < cpu + threads_per_core; i++)
> while(per_cpu(split_state, i).step < step)
> barrier();
The problem is that the split core code needs all possible threads to
participate... if the kernel is booted with smt-enabled set to
something different than the maximum value, some threads are missing
and this ruins the sync.
== Comment: #9 - Greg Kurz <KURZGREG@xxxxxxxxxx> - 2014-12-01 05:24:28 ==
The current implementaqtion for smt-enabled= is a hack: it simply leaves hw threads looping where they happen to be (firmware probably)... This isn't acceptable in a production environment.
An "acceptable" fix would be to start all threads anyway and offline
the ones that need to be to honour the requested SMT mode AFTER
subcores init. This requires a non-trivial patch.
Since changing SMT mode from userspace when the system is booted is
really straightforward, Michael Ellerman suggests we simply drop that
smt-enabled= feature.
Smorigo,
Why were you using smt-enabled= ? Is there a reason not to do it after the system is booted with
ppc64_cpu --smt or writing directly to /sys/devices/system/cpu/cpu*/online ?
== Comment: #10 - Paulo Flabiano Smorigo <pfsmorigo@xxxxxxxxxx> - 2014-12-01 06:23:34 ==
I used smt-enabled= because for me was the easier way to disable it. Like, add this parameter in GRUB_CMDLINE_LINUX and done. :)
I'll check if there is a problem to drop it.
== Comment: #11 - Paulo Flabiano Smorigo <pfsmorigo@xxxxxxxxxx> - 2014-12-01 08:30:55 ==
Greg, are you saying to dropping it for good? Maybe we can add that as a feature request for next year. Btw, I'm ok with drop it for now.
== Comment: #12 - Greg Kurz <KURZGREG@xxxxxxxxxx> - 2014-12-01 09:30:00 ==
(In reply to comment #11)
> Greg, are you saying to dropping it for good? Maybe we can add that as a
> feature request for next year. Btw, I'm ok with drop it for now.
Yes, drop it for good as suggested by Michael Ellerman...
<mpe> groug: that smt-enabled stuff is just a hack. It leaves the cpu executing wherever it happens to be, possibly in firmware, possibly busy looping somewhere, it's really no good for use in production
<mpe> the only way we could make it usable I think is to have the cpu come up, and then we offline it
<mpe> but I'm really inclined to say that should just be done in userspace
<groug> mpe, yeah... I had thought of something similar (starting and then offlining) but I agree it should be handled from userspace
<mpe> I'll talk to benh and anton about it tomorrow, but I think we just rip it out
The point is that it is already extremely easy to change SMT mode from
an init script and you get the same result... compared to the hassle
of doing it in the kernel without breaking things. Not even worth a
feature request I would say.
== Comment: #13 - Greg Kurz <KURZGREG@xxxxxxxxxx> - 2014-12-12 08:50:25 ==
I've sent a patch:
powerpc/powernv: force all CPUs to be bootable
http://patchwork.ozlabs.org/patch/420440/
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1402141/+subscriptions