kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #84250
[Bug 539467] Re: SATA link power management causes disk errors and corruption
This issue appears to be present again on kernsl 3.13 (all) 3.16 (all)
and 3.17 (all)
upon shifting sata link power from min_power state to max_performance
state all kernels report various forms of this error:
[ 45.200582] ata3.00: exception Emask 0x10 SAct 0x8000 SErr 0x50000 action 0xe frozen
[ 45.200586] ata3.00: irq_stat 0x00400000, PHY RDY changed
[ 45.200589] ata3: SError: { PHYRdyChg CommWake }
[ 45.200592] ata3.00: failed command: WRITE FPDMA QUEUED
[ 45.200596] ata3.00: cmd 61/e8:78:00:3f:48/00:00:04:00:00/40 tag 15 ncq 118784 out
[ 45.200596] res 40/00:7c:00:3f:48/00:00:04:00:00/40 Emask 0x10 (ATA bus error)
[ 45.200597] ata3.00: status: { DRDY }
[ 45.200601] ata3: hard resetting link
[ 45.925051] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 45.925911] ata3.00: configured for UDMA/133
[ 45.941016] ahci 0000:00:1f.2: port does not support device sleep
[ 45.941029] ata3: EH complete
With the current 3.13 kernel reporting the most severe errors of block
write failures, etc.
The machine this is being tested on is an A05 bios Dell XPS13 (9333)
[ 2.288104] ata3.00: ATA-8: LITEONIT LMT-256L9M-11 MSATA 256GB, HM8110B, max UDMA/133
[ 2.288554] scsi 2:0:0:0: Direct-Access ATA LITEONIT LMT-256 10B PQ: 0 ANSI: 5
As this machine is brand new, it's possible that the HW is actually
failing, however SMART doesn't indicate any problems with the block
device
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.17.0-031700-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: LITEONIT LMT-256L9M-11 MSATA 256GB
Serial Number: TW0N42H75508548P1854
Firmware Version: HM8110B
User Capacity: 256,060,514,304 bytes [256 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ATA8-ACS, ATA/ATAPI-7 T13/1532D revision 4a
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Oct 10 13:39:25 2014 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 10) seconds.
Offline data collection
capabilities: (0x15) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0003 100 100 000 Pre-fail Always - 0
12 Power_Cycle_Count 0x0003 100 100 000 Pre-fail Always - 46
175 Program_Fail_Count_Chip 0x0003 100 100 000 Pre-fail Always - 0
176 Erase_Fail_Count_Chip 0x0003 100 100 000 Pre-fail Always - 0
177 Wear_Leveling_Count 0x0003 100 100 000 Pre-fail Always - 1946
178 Used_Rsvd_Blk_Cnt_Chip 0x0003 100 100 000 Pre-fail Always - 0
179 Used_Rsvd_Blk_Cnt_Tot 0x0003 100 100 000 Pre-fail Always - 0
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033 100 100 000 Pre-fail Always - 1216
181 Program_Fail_Cnt_Total 0x0003 100 100 000 Pre-fail Always - 0
182 Erase_Fail_Count_Total 0x0003 100 100 000 Pre-fail Always - 0
187 Reported_Uncorrect 0x0003 100 100 000 Pre-fail Always - 0
195 Hardware_ECC_Recovered 0x0003 100 100 000 Pre-fail Always - 0
241 Total_LBAs_Written 0x0003 100 100 000 Pre-fail Always - 8704
242 Total_LBAs_Read 0x0003 100 100 000 Pre-fail Always - 1385
SMART Error Log Version: 0
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 0 -
# 2 Short offline Completed without error 00% 0 -
Selective Self-tests/Logging not supported
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/539467
Title:
SATA link power management causes disk errors and corruption
Status in The Linux Kernel:
Expired
Status in “linux” package in Ubuntu:
Invalid
Status in “pm-utils” package in Ubuntu:
Fix Released
Status in “pm-utils-powersave-policy” package in Ubuntu:
Invalid
Status in “linux” source package in Lucid:
Won't Fix
Status in “pm-utils” source package in Lucid:
Invalid
Status in “pm-utils-powersave-policy” source package in Lucid:
Fix Released
Status in “linux” source package in Maverick:
Invalid
Status in “pm-utils” source package in Maverick:
Invalid
Status in “pm-utils-powersave-policy” source package in Maverick:
Invalid
Status in “linux” source package in Natty:
Invalid
Status in “pm-utils” source package in Natty:
Fix Released
Status in “pm-utils-powersave-policy” source package in Natty:
Invalid
Bug description:
SRU Justification for pm-utils-powersave-policy:
Impact: On certain hardware, enabling power saving for the SATA link
can cause data corruption.
How Addressed: The proposed branch removes the sata link power policy
script. This will cause the link to be maintained at the normal power
usage instead of dropping when the power is removed from the machine.
Reproduction: On an affected machine, unplug and plug in the power a
few times. Data corruption will result.
Regression Potential: Removing the script will cause the SATA link to
stay fully powered at all times. This may cause an increase in the
battery usage for some machines. There should be no functionality
regressions or bugs introduced by this change.
=====
Using Lucid on my laptop, I see errors like this in dmesg quite
frequently (every few hours):
Mar 14 23:00:09 chris-laptop kernel: [42987.460608] ata1.00: exception Emask 0x10 SAct 0x1 SErr 0x50000 action 0xe frozen
Mar 14 23:00:09 chris-laptop kernel: [42987.460618] ata1.00: irq_stat 0x00400000, PHY RDY changed
Mar 14 23:00:09 chris-laptop kernel: [42987.460627] ata1: SError: { PHYRdyChg CommWake }
Mar 14 23:00:09 chris-laptop kernel: [42987.460635] ata1.00: failed command: READ FPDMA QUEUED
Mar 14 23:00:09 chris-laptop kernel: [42987.460649] ata1.00: cmd 60/08:00:97:23:44/00:00:01:00:00/40 tag 0 ncq 4096 in
Mar 14 23:00:09 chris-laptop kernel: [42987.460652] res 40/00:04:97:23:44/00:00:01:00:00/40 Emask 0x10 (ATA bus error)
Mar 14 23:00:09 chris-laptop kernel: [42987.460669] ata1.00: status: { DRDY }
Mar 14 23:00:09 chris-laptop kernel: [42987.460681] ata1: hard resetting link
Mar 14 23:00:09 chris-laptop kernel: [42987.523336] ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
Mar 14 23:00:09 chris-laptop kernel: [42987.523346] ata2: irq_stat 0x00400000, PHY RDY changed
Mar 14 23:00:09 chris-laptop kernel: [42987.523355] ata2: SError: { PHYRdyChg CommWake }
Mar 14 23:00:09 chris-laptop kernel: [42987.523368] ata2: hard resetting link
Mar 14 23:00:09 chris-laptop kernel: [42988.202586] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 14 23:00:09 chris-laptop kernel: [42988.205443] ata1.00: configured for UDMA/133
Mar 14 23:00:09 chris-laptop kernel: [42988.205459] ata1: EH complete
Mar 14 23:00:09 chris-laptop kernel: [42988.280089] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Mar 14 23:00:09 chris-laptop kernel: [42988.285567] ata2.00: configured for UDMA/100
Mar 14 23:00:09 chris-laptop kernel: [42988.289370] ata2: EH complete
Every couple of days, this results in data corruption and my
filesystem being remounted read-only:
[ 6148.305806] Aborting journal on device sda1-8.
[ 6148.325011] EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
[ 6148.325018] EXT4-fs (sda1): Remounting filesystem read-only
[ 6148.326702] journal commit I/O error
[ 6148.330975] EXT4-fs error (device sda1) in ext4_reserve_inode_write: Journal has aborted
[ 6148.462572] __ratelimit: 15 callbacks suppressed
Those messages generally appear at the end of dmesg after the event,
just after the "hard resetting link" message. I then have to boot a
live CD and manually run fsck, as I can no longer boot the laptop.
This is happening every couple of days generally, although it happened
3 times in one day last Thursday.
I did contemplate it being a hardware issue, but I tried running the
kernel from Karmic for a couple of days, and that worked ok without a
single error message
ProblemType: Bug
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC0: chr1s 4010 F.... pulseaudio
/dev/snd/controlC1: chr1s 4010 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
Card hw:0 'Intel'/'HDA Intel at 0xf6afc000 irq 21'
Mixer name : 'Intel G45 DEVCTG'
Components : 'HDA:111d76b2,10280263,00100302 HDA:80862802,80860101,00100000'
Controls : 22
Simple ctrls : 11
Card1.Amixer.info:
Card hw:1 'U0x46d0x9a4'/'USB Device 0x46d:0x9a4 at usb-0000:00:1a.7-3.3, high speed'
Mixer name : 'USB Mixer'
Components : 'USB046d:09a4'
Controls : 2
Simple ctrls : 1
Card1.Amixer.values:
Simple mixer control 'Mic',0
Capabilities: cvolume cvolume-joined cswitch cswitch-joined penum
Capture channels: Mono
Limits: Capture 0 - 14
Mono: Capture 0 [0%] [23.75dB] [on]
Date: Tue Mar 16 10:07:41 2010
DistroRelease: Ubuntu 10.04
Frequency: Once a day.
HibernationDevice: RESUME=UUID=762f3439-67ac-4828-aa94-caf2a2ba0f9a
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
LiveMediaBuild: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
MachineType: Dell Inc. Latitude E5500
Package: linux-image-2.6.32-16-generic 2.6.32-16.25
PccardctlIdent:
Socket 0:
no product info available
PccardctlStatus:
Socket 0:
no card
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-16-generic root=UUID=4ce5e12b-6e82-4fa4-90ff-7d9859d7504e ro quiet splash
ProcEnviron:
LANG=en_GB.utf8
SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-16.25-generic
Regression: Yes
RelatedPackageVersions: linux-firmware 1.32
Reproducible: No
SourcePackage: linux
TestedUpstream: No
Uname: Linux 2.6.32-16-generic x86_64
dmi.bios.date: 11/05/2009
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A15
dmi.board.name: 0DW635
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA15:bd11/05/2009:svnDellInc.:pnLatitudeE5500:pvr:rvnDellInc.:rn0DW635:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: Latitude E5500
dmi.sys.vendor: Dell Inc.
To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/539467/+subscriptions