← Back to team overview

kernel-packages team mailing list archive

[Bug 539467] Re: SATA link power management causes disk errors and corruption

 

I think I see this with Ubuntu saucy (Kernel 3.11.0-12-generic 64 bit.),
on a Thinkpad T520. I see it even when running on external power and
'link_power_management_policy' set to 'max_performance':

[23391.818508] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
[23391.818513] ata4: irq_stat 0x00000040, connection status changed
[23391.818516] ata4: SError: { CommWake DevExch }
[23391.818523] ata4: limiting SATA link speed to 1.5 Gbps
[23391.818525] ata4: hard resetting link
[23392.538993] ata4: SATA link down (SStatus 0 SControl 310)
[23392.554893] ata4: EH complete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/539467

Title:
  SATA link power management causes disk errors and corruption

Status in The Linux Kernel:
  Expired
Status in “linux” package in Ubuntu:
  Invalid
Status in “pm-utils” package in Ubuntu:
  Fix Released
Status in “pm-utils-powersave-policy” package in Ubuntu:
  Invalid
Status in “linux” source package in Lucid:
  Won't Fix
Status in “pm-utils” source package in Lucid:
  Invalid
Status in “pm-utils-powersave-policy” source package in Lucid:
  Fix Released
Status in “linux” source package in Maverick:
  Invalid
Status in “pm-utils” source package in Maverick:
  Invalid
Status in “pm-utils-powersave-policy” source package in Maverick:
  Invalid
Status in “linux” source package in Natty:
  Invalid
Status in “pm-utils” source package in Natty:
  Fix Released
Status in “pm-utils-powersave-policy” source package in Natty:
  Invalid

Bug description:
  SRU Justification for pm-utils-powersave-policy:

  Impact: On certain hardware, enabling power saving for the SATA link
  can cause data corruption.

  How Addressed: The proposed branch removes the sata link power policy
  script. This will cause the link to be maintained at the normal power
  usage instead of dropping when the power is removed from the machine.

  Reproduction: On an affected machine, unplug and plug in the power a
  few times. Data corruption will result.

  Regression Potential: Removing the script will cause the SATA link to
  stay fully powered at all times. This may cause an increase in the
  battery usage for some machines. There should be no functionality
  regressions or bugs introduced by this change.

  =====

  Using Lucid on my laptop, I see errors like this in dmesg quite
  frequently (every few hours):

  Mar 14 23:00:09 chris-laptop kernel: [42987.460608] ata1.00: exception Emask 0x10 SAct 0x1 SErr 0x50000 action 0xe frozen
  Mar 14 23:00:09 chris-laptop kernel: [42987.460618] ata1.00: irq_stat 0x00400000, PHY RDY changed
  Mar 14 23:00:09 chris-laptop kernel: [42987.460627] ata1: SError: { PHYRdyChg CommWake }
  Mar 14 23:00:09 chris-laptop kernel: [42987.460635] ata1.00: failed command: READ FPDMA QUEUED
  Mar 14 23:00:09 chris-laptop kernel: [42987.460649] ata1.00: cmd 60/08:00:97:23:44/00:00:01:00:00/40 tag 0 ncq 4096 in
  Mar 14 23:00:09 chris-laptop kernel: [42987.460652]          res 40/00:04:97:23:44/00:00:01:00:00/40 Emask 0x10 (ATA bus error)
  Mar 14 23:00:09 chris-laptop kernel: [42987.460669] ata1.00: status: { DRDY }
  Mar 14 23:00:09 chris-laptop kernel: [42987.460681] ata1: hard resetting link
  Mar 14 23:00:09 chris-laptop kernel: [42987.523336] ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
  Mar 14 23:00:09 chris-laptop kernel: [42987.523346] ata2: irq_stat 0x00400000, PHY RDY changed
  Mar 14 23:00:09 chris-laptop kernel: [42987.523355] ata2: SError: { PHYRdyChg CommWake }
  Mar 14 23:00:09 chris-laptop kernel: [42987.523368] ata2: hard resetting link
  Mar 14 23:00:09 chris-laptop kernel: [42988.202586] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  Mar 14 23:00:09 chris-laptop kernel: [42988.205443] ata1.00: configured for UDMA/133
  Mar 14 23:00:09 chris-laptop kernel: [42988.205459] ata1: EH complete
  Mar 14 23:00:09 chris-laptop kernel: [42988.280089] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
  Mar 14 23:00:09 chris-laptop kernel: [42988.285567] ata2.00: configured for UDMA/100
  Mar 14 23:00:09 chris-laptop kernel: [42988.289370] ata2: EH complete

  Every couple of days, this results in data corruption and my
  filesystem being remounted read-only:

  [ 6148.305806] Aborting journal on device sda1-8.
  [ 6148.325011] EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
  [ 6148.325018] EXT4-fs (sda1): Remounting filesystem read-only
  [ 6148.326702] journal commit I/O error
  [ 6148.330975] EXT4-fs error (device sda1) in ext4_reserve_inode_write: Journal has aborted
  [ 6148.462572] __ratelimit: 15 callbacks suppressed

  Those messages generally appear at the end of dmesg after the event,
  just after the "hard resetting link" message. I then have to boot a
  live CD and manually run fsck, as I can no longer boot the laptop.

  This is happening every couple of days generally, although it happened
  3 times in one day last Thursday.

  I did contemplate it being a hardware issue, but I tried running the
  kernel from Karmic for a couple of days, and that worked ok without a
  single error message

  ProblemType: Bug
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  chr1s      4010 F.... pulseaudio
   /dev/snd/controlC1:  chr1s      4010 F.... pulseaudio
  CRDA: Error: [Errno 2] No such file or directory
  Card0.Amixer.info:
   Card hw:0 'Intel'/'HDA Intel at 0xf6afc000 irq 21'
     Mixer name	: 'Intel G45 DEVCTG'
     Components	: 'HDA:111d76b2,10280263,00100302 HDA:80862802,80860101,00100000'
     Controls      : 22
     Simple ctrls  : 11
  Card1.Amixer.info:
   Card hw:1 'U0x46d0x9a4'/'USB Device 0x46d:0x9a4 at usb-0000:00:1a.7-3.3, high speed'
     Mixer name	: 'USB Mixer'
     Components	: 'USB046d:09a4'
     Controls      : 2
     Simple ctrls  : 1
  Card1.Amixer.values:
   Simple mixer control 'Mic',0
     Capabilities: cvolume cvolume-joined cswitch cswitch-joined penum
     Capture channels: Mono
     Limits: Capture 0 - 14
     Mono: Capture 0 [0%] [23.75dB] [on]
  Date: Tue Mar 16 10:07:41 2010
  DistroRelease: Ubuntu 10.04
  Frequency: Once a day.
  HibernationDevice: RESUME=UUID=762f3439-67ac-4828-aa94-caf2a2ba0f9a
  InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
  LiveMediaBuild: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
  MachineType: Dell Inc. Latitude E5500
  Package: linux-image-2.6.32-16-generic 2.6.32-16.25
  PccardctlIdent:
   Socket 0:
     no product info available
  PccardctlStatus:
   Socket 0:
     no card
  ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-16-generic root=UUID=4ce5e12b-6e82-4fa4-90ff-7d9859d7504e ro quiet splash
  ProcEnviron:
   LANG=en_GB.utf8
   SHELL=/bin/bash
  ProcVersionSignature: Ubuntu 2.6.32-16.25-generic
  Regression: Yes
  RelatedPackageVersions: linux-firmware 1.32
  Reproducible: No
  SourcePackage: linux
  TestedUpstream: No
  Uname: Linux 2.6.32-16-generic x86_64
  dmi.bios.date: 11/05/2009
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: A15
  dmi.board.name: 0DW635
  dmi.board.vendor: Dell Inc.
  dmi.chassis.type: 8
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: dmi:bvnDellInc.:bvrA15:bd11/05/2009:svnDellInc.:pnLatitudeE5500:pvr:rvnDellInc.:rn0DW635:rvr:cvnDellInc.:ct8:cvr:
  dmi.product.name: Latitude E5500
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/539467/+subscriptions