← Back to team overview

kernel-packages team mailing list archive

[Bug 593635] Re: HDD freezes caused by ata exception that results in soft resetting of link

 

[Expired for linux (Ubuntu) because there has been no activity for 60
days.]

** Changed in: linux (Ubuntu)
       Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/593635

Title:
  HDD freezes caused by ata exception that results in soft resetting of
  link

Status in “linux” package in Ubuntu:
  Expired

Bug description:
  Under even moderately heavy disk writes, I am seeing exceptions like the below in my kern.log
  -----------------------------------------------
  Jun 13 13:33:03 cellar kernel: [66188.434868] ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
  Jun 13 13:33:03 cellar kernel: [66188.434874] ata4.01: BMDMA stat 0x46
  Jun 13 13:33:03 cellar kernel: [66188.434879] ata4.01: failed command: WRITE DMA EXT
  Jun 13 13:33:03 cellar kernel: [66188.434886] ata4.01: cmd 35/00:00:00:94:b2/00:04:13:00:00/f0 tag 0 dma 524288 out
  Jun 13 13:33:03 cellar kernel: [66188.434888]          res 51/84:01:ff:95:b2/84:02:13:00:00/f0 Emask 0x30 (host bus error)
  Jun 13 13:33:03 cellar kernel: [66188.434892] ata4.01: status: { DRDY ERR }
  Jun 13 13:33:03 cellar kernel: [66188.434895] ata4.01: error: { ICRC ABRT }
  Jun 13 13:33:03 cellar kernel: [66188.434907] ata4: soft resetting link
  Jun 13 13:33:03 cellar kernel: [66188.622000] ata4.01: configured for UDMA/100
  Jun 13 13:33:03 cellar kernel: [66188.622013] ata4: EH complete
  ----------------------------------------------

  This is with the latest stable lucid kernel (2.6.32-22-generic
  #36-Ubuntu).

  I've also tried a mainline kernel (2.6.35-020635rc1) & still get the
  same errors except that there's an additional stack trace:

  -----------------------------------------------

  Jun 14 18:55:40 cellar kernel: [  152.874172] irq 19: nobody cared (try booting with the "irqpoll" option)
  Jun 14 18:55:40 cellar kernel: [  152.874182] Pid: 0, comm: swapper Tainted: P            2.6.35-020635rc1-generic #020635rc1
  Jun 14 18:55:40 cellar kernel: [  152.874185] Call Trace:
  Jun 14 18:55:40 cellar kernel: [  152.874198]  [<c01a58cc>] __report_bad_irq+0x2c/0x90
  Jun 14 18:55:40 cellar kernel: [  152.874204]  [<c016fee3>] ? sched_clock_tick+0x73/0xa0
  Jun 14 18:55:40 cellar kernel: [  152.874209]  [<c01a5a44>] note_interrupt+0xe4/0x120
  Jun 14 18:55:40 cellar kernel: [  152.874214]  [<c0179da0>] ? tick_nohz_update_jiffies+0x60/0x70
  Jun 14 18:55:40 cellar kernel: [  152.874219]  [<c01a6364>] handle_fasteoi_irq+0x84/0xe0
  Jun 14 18:55:40 cellar kernel: [  152.874224]  [<c0104abf>] handle_irq+0x1f/0x30
  Jun 14 18:55:40 cellar kernel: [  152.874230]  [<c05afefb>] do_IRQ+0x4b/0xc0
  Jun 14 18:55:40 cellar kernel: [  152.874234]  [<c01032f0>] common_interrupt+0x30/0x40
  Jun 14 18:55:40 cellar kernel: [  152.874239]  [<c010a3a7>] ? mwait_idle+0x57/0xa0
  Jun 14 18:55:40 cellar kernel: [  152.874243]  [<c010189c>] cpu_idle+0x8c/0xc0
  Jun 14 18:55:40 cellar kernel: [  152.874249]  [<c05a4337>] start_secondary+0xf7/0x130
  Jun 14 18:55:40 cellar kernel: [  152.874252] handlers:
  Jun 14 18:55:40 cellar kernel: [  152.874254] [<c0431060>] (ata_bmdma_interrupt+0x0/0x190)
  Jun 14 18:55:40 cellar kernel: [  152.874261] [<c044fb10>] (usb_hcd_irq+0x0/0x90)
  Jun 14 18:55:40 cellar kernel: [  152.874268] Disabling IRQ #19
  Jun 14 18:56:09 cellar kernel: [  181.856015] ata4: lost interrupt (Status 0x51)
  Jun 14 18:56:09 cellar kernel: [  181.856034] ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
  Jun 14 18:56:09 cellar kernel: [  181.856039] ata4.01: BMDMA stat 0x46, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0
  Jun 14 18:56:09 cellar kernel: [  181.856045] ata4.01: failed command: WRITE DMA EXT
  Jun 14 18:56:09 cellar kernel: [  181.856053] ata4.01: cmd 35/00:00:00:84:08/00:04:3b:00:00/f0 tag 0 dma 524288 out
  Jun 14 18:56:09 cellar kernel: [  181.856054]          res 40/00:00:00:4f:c2/00:00:00:00:00/50 Emask 0x24 (host bus error)
  Jun 14 18:56:09 cellar kernel: [  181.856058] ata4.01: status: { DRDY }
  Jun 14 18:56:09 cellar kernel: [  181.856072] ata4: soft resetting link
  Jun 14 18:56:09 cellar kernel: [  182.160065] ata4.01: configured for UDMA/133
  Jun 14 18:56:09 cellar kernel: [  182.160072] ata4.01: device reported invalid CHS sector 0
  Jun 14 18:56:09 cellar kernel: [  182.160080] ata4: EH complete
  --------------------------------------------------------------------

  I've tried booting with "libata.force=noncq" on both kernels (lucid
  stable & 2.6.35 mainline) but makes no difference.

  I didn't see these errors in Jaunty. I think they started sometime in
  Karmic. I upgraded to Lucid in the hopes that the newer release fixed
  it but no difference.

  I think I've ruled out HDD failure. I get these errors on 2 old (3+
  years) Seagate 7200.10 disks as well as a brand new Seagate 7200.12
  disk.

  There are similar bug reports in launchpad but one difference that I
  noticed is that I consistently see the message "failed command: WRITE
  DMA EXT" while the other reports fail during a read or some other
  command.

  I can very reliably reproduce the errors by running a rdiff-backup
  'restore' operation from an external USB HDD.

  == Steps to reproduce ==
  1. Boot into Gnome & login
  2. Run 'tail -f /var/log/kern.log' in one terminal window
  3. Run 'rdiff-backup --force -r now /media/freeagent/share /share/' in another terminal

  Within a few seconds, I can see the errors show up in the kernel logs.

  Running a fast torrent download will do the trick too.

  Since I can reproduce the problem so easily, I'll be very willing to
  try any special kernel builds to help solve this one.

  ProblemType: Bug
  DistroRelease: Ubuntu 10.04
  Package: linux-image-2.6.32-22-generic 2.6.32-22.36
  Regression: Yes
  Reproducible: Yes
  ProcVersionSignature: Ubuntu 2.6.32-22.36-generic 2.6.32.11+drm33.2
  Uname: Linux 2.6.32-22-generic i686
  NonfreeKernelModules: nvidia
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
  Architecture: i386
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  antrix     1387 F.... pulseaudio
  CRDA: Error: [Errno 2] No such file or directory
  Card0.Amixer.info:
   Card hw:0 'Intel'/'HDA Intel at 0xf9ffc000 irq 16'
     Mixer name	: 'Realtek ALC662 rev1'
     Components	: 'HDA:10ec0662,15650000,00100101'
     Controls      : 36
     Simple ctrls  : 19
  Date: Mon Jun 14 19:23:00 2010
  HibernationDevice: RESUME=UUID=c6dab799-13a8-443e-b2a3-4b93f3bbb42e
  IwConfig:
   lo        no wireless extensions.
   
   eth0      no wireless extensions.
  MachineType: BIOSTAR Group G31-M7 TE
  ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.32-22-generic root=UUID=466535ad-0b59-4fd0-b18b-ba486150f91a ro quiet splash
  ProcEnviron:
   PATH=(custom, user)
   LANG=en_SG.utf8
   SHELL=/bin/bash
  RelatedPackageVersions: linux-firmware 1.34
  RfKill:
   
  SourcePackage: linux
  dmi.bios.date: 04/10/2009
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 080014
  dmi.board.asset.tag: To Be Filled By O.E.M.
  dmi.board.name: G31-M7 TE
  dmi.board.vendor: BIOSTAR Group
  dmi.chassis.asset.tag: None
  dmi.chassis.type: 3
  dmi.chassis.vendor: BIOSTAR Group
  dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr080014:bd04/10/2009:svnBIOSTARGroup:pnG31-M7TE:pvr:rvnBIOSTARGroup:rnG31-M7TE:rvr:cvnBIOSTARGroup:ct3:cvr:
  dmi.product.name: G31-M7 TE
  dmi.sys.vendor: BIOSTAR Group

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/593635/+subscriptions