kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #133215
[Bug 1491797] Re: Shuts down when supposed to suspend as a reaction to self-caused overheat, session lost
Let's take this one point at a time:
* fan not running at full speed in disengaged mode in a thermal emergency
- as mentioned earlier, the default fan mode on the machine is to run under firmware control, in which case it runs in engaged mode with a loop feed back controller so it never exceeds a top speed of 3500 RPM. This matches the original thermal design by the manufacturer. So either they made a mistake and all machines like yours overheat (and we would see lots of owners with your machine reporting this bug) or this issue is particular to your machine
* CPU not throttling in a thermal emergency (unless the frequency readings are wrong)
- that needs investigation as thermald should be doing that (but as I mentioned earlier, I will examine the thermald issues later)
* shutting down when supposed to suspend as a reaction to overheat, unnecessarily destroying session
- when a critical thermal event occurs one has a very short time window to react. Potentially the silicon may be permanently damaged, so the kernel chooses to power down rather ran try to suspend (since this can get stuck and exacerbate the issue). Without the handling of this thermal event, the next step is for the hardware to physically shut itself down which is out of any form of operating system control, so either way, the machine is desperately trying to save itself from breaking.
* destroying session in a shut down/restart cycle (I heard rumours this may be fixed later in Snappy with containers)
- again, in a rush to save your silicon from becoming irreparably damaged shutdown is the fastest mechanism. Snappy containers will not help.
I'd recommend reading https://en.wikipedia.org/wiki/Thermal_design_power, there is paragraph that states:
"Most modern processors will cause a therm-trip only upon a catastrophic cooling failure, such as a no longer operational fan or an incorrectly mounted heatsink."
So, the next step will be to see if we can see what thermald is doing.
1. Stop thermald so we can re-enable it with full debug on:
sudo systemctl stop thermald (if you are using systemd)
or
sudo service thermald stop (if you are using upstart)
2. Run thermald for a while from the command line and capture debug
output:
sudo thermald --no-daemon --dbus-enable --loglevel=debug | tee
thermald.log
..run this say for 5-10 minutes and use your machine, then attach the
thermald.log to the bug report
3. Re-start themrald
sudo systemctl start thermald (if you are using systemd)
or
sudo service thermald start (if you are using upstart)
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1491797
Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost
Status in linux package in Ubuntu:
In Progress
Bug description:
Error:
Kernel foolishly shuts down the computer when it overheats.
/var/log/kern.log
W500 kernel: [1448.648529] thermal thermal_zone1: critical temperature reached (100 C), shutting down
Consequence:
Shutting down destroys session in Ubuntu, Gnome, and all applications that can't remember their latest conscious state (most applications).
Attempted repair, failed:
Laptop has suspending ability, but I can't find the setting for the kernel to make the computer suspend instead of shutting down.
Repair suggestions:
1. Persistence of session, so that everything would reappear after the restart. (this would also make updating less disruptive)
2. Do not heat the machine like crazy; speed up fans or slow down processes. (problematic Lenovo Thinkpad W500 fan on low speed right up to the fiery end)
3. Put the computer to suspend when it's too hot.
(The problem has remained the same from at least Ubuntu 11.10 through
14.04)
ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-62-generic 3.13.0-62.102
ProcVersionSignature: Ubuntu 3.13.0-62.102-generic 3.13.11-ckt24
Uname: Linux 3.13.0-62-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.12
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC0: user 2171 F.... pulseaudio
CurrentDesktop: Unity
Date: Thu Sep 3 13:42:28 2015
HibernationDevice: RESUME=UUID=991e1383-ff5b-46c1-84c4-c904e1d81256
InstallationDate: Installed on 2013-12-29 (612 days ago)
InstallationMedia: Ubuntu 13.10 "Saucy Salamander" - Release amd64 (20131016.1)
MachineType: LENOVO 4063B22
PccardctlIdent:
Socket 0:
no product info available
PccardctlStatus:
Socket 0:
no card
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-62-generic root=UUID=bd426989-b545-41b3-97b8-de9410f27aa6 ro persistent quiet splash vt.handoff=7
RelatedPackageVersions:
linux-restricted-modules-3.13.0-62-generic N/A
linux-backports-modules-3.13.0-62-generic N/A
linux-firmware 1.127.15
SourcePackage: linux
UpgradeStatus: Upgraded to trusty on 2014-04-27 (494 days ago)
dmi.bios.date: 12/14/2011
dmi.bios.vendor: LENOVO
dmi.bios.version: 6FET92WW (3.22 )
dmi.board.name: 4063B22
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6FET92WW(3.22):bd12/14/2011:svnLENOVO:pn4063B22:pvrThinkPadW500:rvnLENOVO:rn4063B22:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 4063B22
dmi.product.version: ThinkPad W500
dmi.sys.vendor: LENOVO
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
References