← Back to team overview

kernel-packages team mailing list archive

[Bug 1063354] Re: [Dell Studio XPS 1640] Sudden Read-Only Filesystems

 

I have an old tower which I use to test multiple operating systems.
Each OS lives on a separate drive in a removable tray, so the drives can
be swapped as needed.  Once in a while the system would hang when the
BIOS was set to auto-detect the drives at every boot, or I would see an
occasional failure to mount the ATA boot device when Linux was started
in verbose mode--and Windows would simply freeze randomly.  The problem
was traced to the power connector on a drive tray:  I had to extract the
pins from the connector with a special tool, cut off the wires, soak the
pins in contact cleaner, and solder them back on, because the crimped
connection and the corrosion made it unreliable.

http://en.wikipedia.org/wiki/Molex_connector#Disk_drive_connector_.28AMP_MATE-N-LOK_1-480424-0_Power_Connector.29

http://www.molex.com/molex/products/family?key=disk_drive_power_connector&channel=PRODUCTS&chanName=family&pageTitle=Introduction

I never had a problem with these connectors before, except for the ones
in the Enermax trays (which seem to be made of the cheapest materials
they could find.)  Before I repaired the power connector, I encountered
that read-only bug in Ubuntu.  When this occurred, ALL physical volumes
attached to the machine became read-only, including other hard drives
and all external USB storage devices.  Even new USB devices attached
later were not writable.  The only thing I could write to was a network
share.  If this happens on all affected platforms, it might give
developers some idea of what to look for in the source code.  I also
wonder if some power management feature could be involved:

GRUB_CMDLINE_LINUX="libata.dma=0 libata.noacpi=1"
http://ubuntuforums.org/showthread.php?t=1892483

I believe this bug can be triggered by other things too, such as system
BIOS bug or AHCI preference, drive firmware bug, defective electrolytic
capacitors on a old mainboard, bad solder joints just about anywhere, a
defective (or overloaded) power supply.  But in the case of SSD drives
it could also be a latency issue:

Why Solid-State Drives Slow Down As You Fill Them Up (Ubuntu should warn about this)
 "When filling up an empty drive, they found high write performance very early in the process and a significant drop as the write operations continued to fill up the drive...  If you have a solid-state drive, you should try to avoid using more than 75% of its capacity."
http://www.howtogeek.com/165542/why-solid-state-drives-slow-down-as-you-fill-them-up/

(for general reference on dual-boot systems):
12 Things You Must Do When Running a Solid State Drive in Windows 7
http://www.maketecheasier.com/12-things-you-must-do-when-running-a-solid-state-drive-in-windows-7/

I suspect that people who experience read-only issues today were
experiencing silent write retries in previous kernel versions and simply
did not notice because the retry was successful.  It seems like the
common thread is that the drive was not ready to accept writes for some
reason, and the kernel did not detect this condition.  I tried to
simulate this by removing power to the drive momentarily.  During this
time, CPU usage was very high, but it returned to normal when power was
applied, and the read-only bug was not triggered.

On various other platforms I have seen S.M.A.R.T. drives which are NOT
defective logging an "Interface CRC error" when a 'READ DMA EXT' command
was issued, due to a cable or connector fault.  When the drive was moved
to another system, the errors stopped.  So the drive is not necessarily
failing just because you see the error count going up.

I think that a S.M.A.R.T. status monitor should be included with the
base installation: the S.M.A.R.T. feature is not only useful to diagnose
faults within the drive, it sometimes permits you to infer something
about the quality of the power & data connection over time.  If you can
consistently correlate some particular S.M.A.R.T. error code with the
behavior that causes the volume to turn read-only, then you may have
found a way to distinguish a cable fault from a kernel or firmware bug,
and the OS could use it to generate more helpful error messages.  So it
might be good to report which (if any) of the drives S.M.A.R.T. counters
were incremented when you experience that read-only problem.

I am not too familiar with the specifications, but developers might also
want to investigate the possibility of using the System Management bus
or Power Management bus to assist in characterizing these failures if
the platform collects any useful information.  For those who solved the
problem by disabling NCQ: there was an NCQ drive blacklist for the Linux
kernel until (I believe) 2.6.24.  This implies some incompatibility with
particular models.

"there are drives with firmware bugs that deliberately lie about when data has been physically written."
http://serverfault.com/questions/460864/safety-of-write-cache-on-sata-drives-with-barriers
_____

"One little-known feature of NCQ is that the host can specify whether it
wants to be notified of completion when the data hits the disk's
platters or when it hits the disk's buffer (on-board cache)." (Does the
kernel do this correctly?)

"NCQ can negatively interfere with the operating system's I/O scheduler,
actually decreasing performance; this has been observed in practice on
Linux with RAID-5.  There is no mechanism in NCQ for the host to specify
any sort of deadlines for an I/O, like how many times a request can be
ignored in favor of others.  In theory, a NCQ-ed request can be delayed
by the drive an arbitrary amount of time while it is serving other
(possibly new) requests under I/O pressure.  Since the algorithms used
inside drive firmware for NCQ dispatch ordering are generally not
publicly known, this introduces another level of uncertainty for
hardware/firmware performance.  Tests at Google around 2008 have shown
that NCQ can delay an I/O for up to 1-2 seconds."

http://en.wikipedia.org/wiki/Native_Command_Queuing
_____

Test if NCQ is enabled: dmesg | grep -i ncq
Write-protect & cache status: dmesg | grep sda
_____

Operational theory / Educational resources:

Modern disk write caches and how they get dealt with
http://utcc.utoronto.ca/~cks/space/blog/tech/ModernDiskWriteCaches

How to force a disk write cache flush operation on Linux
http://utcc.utoronto.ca/~cks/space/blog/linux/ForceDiskFlushes

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1063354

Title:
  [Dell Studio XPS 1640] Sudden Read-Only Filesystems

Status in “linux” package in Ubuntu:
  Incomplete

Bug description:
  After upgrading to ubuntu 12.10, I experience sudden locks of my
  filesystems (I have a root and a home partition with ext4), in which
  the filesystems suddenly become mounted readonly. /var/log/syslog
  shows the following entries:

  Oct  7 20:00:42 StudioXPS signond[3510]: signondaemon.cpp 345 init Failed to SUID root. Secure storage will not be available.
  Oct  7 20:02:12 StudioXPS kernel: [  249.193555] ata1.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x0
  Oct  7 20:02:12 StudioXPS kernel: [  249.193561] ata1.00: irq_stat 0x40000001
  Oct  7 20:02:12 StudioXPS kernel: [  249.193565] ata1.00: failed command: READ FPDMA QUEUED
  Oct  7 20:02:12 StudioXPS kernel: [  249.193572] ata1.00: cmd 60/20:00:90:6f:53/00:00:1a:00:00/40 tag 0 ncq 16384 in
  Oct  7 20:02:12 StudioXPS kernel: [  249.193572]          res 41/40:20:98:6f:53/00:00:1a:00:00/40 Emask 0x409 (media error) <F>
  Oct  7 20:02:12 StudioXPS kernel: [  249.193575] ata1.00: status: { DRDY ERR }
  Oct  7 20:02:12 StudioXPS kernel: [  249.193578] ata1.00: error: { UNC }
  Oct  7 20:02:12 StudioXPS kernel: [  249.193581] ata1.00: failed command: WRITE FPDMA QUEUED
  Oct  7 20:02:12 StudioXPS kernel: [  249.193587] ata1.00: cmd 61/18:08:18:fb:0e/00:00:2b:00:00/40 tag 1 ncq 12288 out
  Oct  7 20:02:12 StudioXPS kernel: [  249.193587]          res 41/40:08:98:6f:53/00:00:1a:00:00/40 Emask 0x9 (media error)
  Oct  7 20:02:12 StudioXPS kernel: [  249.193590] ata1.00: status: { DRDY ERR }
  Oct  7 20:02:12 StudioXPS kernel: [  249.193593] ata1.00: error: { UNC }
  Oct  7 20:02:12 StudioXPS kernel: [  249.193596] ata1.00: failed command: WRITE FPDMA QUEUED
  Oct  7 20:02:12 StudioXPS kernel: [  249.193602] ata1.00: cmd 61/d8:10:a0:bd:8b/00:00:0d:00:00/40 tag 2 ncq 110592 out
  Oct  7 20:02:12 StudioXPS kernel: [  249.193602]          res 41/40:08:98:6f:53/00:00:1a:00:00/40 Emask 0x9 (media error)
  Oct  7 20:02:12 StudioXPS kernel: [  249.193605] ata1.00: status: { DRDY ERR }
  Oct  7 20:02:12 StudioXPS kernel: [  249.193607] ata1.00: error: { UNC }
  Oct  7 20:02:12 StudioXPS kernel: [  249.196606] ata1.00: configured for UDMA/100
  Oct  7 20:02:12 StudioXPS kernel: [  249.196622] sd 0:0:0:0: >[sda] Unhandled sense code
  Oct  7 20:02:12 StudioXPS kernel: [  249.196624] sd 0:0:0:0: >[sda]
  Oct  7 20:02:12 StudioXPS kernel: [  249.196626] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
  Oct  7 20:02:12 StudioXPS kernel: [  249.196628] sd 0:0:0:0: >[sda]
  Oct  7 20:02:12 StudioXPS kernel: [  249.196629] Sense Key : Medium Error [current] [descriptor]
  Oct  7 20:02:12 StudioXPS kernel: [  249.196633] Descriptor sense data with sense descriptors (in hex):
  Oct  7 20:02:12 StudioXPS kernel: [  249.196634]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
  Oct  7 20:02:12 StudioXPS kernel: [  249.196642]         1a 53 6f 98
  Oct  7 20:02:12 StudioXPS kernel: [  249.196645] sd 0:0:0:0: >[sda]
  Oct  7 20:02:12 StudioXPS kernel: [  249.196648] Add. Sense: Unrecovered read error - auto reallocate failed
  Oct  7 20:02:12 StudioXPS kernel: [  249.196650] sd 0:0:0:0: >[sda] CDB:
  Oct  7 20:02:12 StudioXPS kernel: [  249.196651] Read(10): 28 00 1a 53 6f 90 00 00 20 00
  Oct  7 20:02:12 StudioXPS kernel: [  249.196658] end_request: I/O error, dev sda, sector 441675672
  Oct  7 20:02:12 StudioXPS kernel: [  249.196674] sd 0:0:0:0: >[sda] Unhandled sense code
  Oct  7 20:02:12 StudioXPS kernel: [  249.196676] sd 0:0:0:0: >[sda]
  Oct  7 20:02:12 StudioXPS kernel: [  249.196678] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
  Oct  7 20:02:12 StudioXPS kernel: [  249.196679] sd 0:0:0:0: >[sda]
  Oct  7 20:02:12 StudioXPS kernel: [  249.196681] Sense Key : Medium Error [current] [descriptor]
  Oct  7 20:02:12 StudioXPS kernel: [  249.196683] Descriptor sense data with sense descriptors (in hex):
  Oct  7 20:02:12 StudioXPS kernel: [  249.196684]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
  Oct  7 20:02:12 StudioXPS kernel: [  249.196692]         1a 53 6f 98
  Oct  7 20:02:12 StudioXPS kernel: [  249.196695] sd 0:0:0:0: >[sda]
  Oct  7 20:02:12 StudioXPS kernel: [  249.196697] Add. Sense: Unrecovered read error - auto reallocate failed
  Oct  7 20:02:12 StudioXPS kernel: [  249.196699] sd 0:0:0:0: >[sda] CDB:
  Oct  7 20:02:12 StudioXPS kernel: [  249.196700] Write(10): 2a 00 2b 0e fb 18 00 00 18 00
  Oct  7 20:02:12 StudioXPS kernel: [  249.196706] end_request: I/O error, dev sda, sector 722402072
  Oct  7 20:02:12 StudioXPS kernel: [  249.196710] Buffer I/O error on device sda6, logical block 82899555
  Oct  7 20:02:12 StudioXPS kernel: [  249.196718] Buffer I/O error on device sda6, logical block 82899556
  Oct  7 20:02:12 StudioXPS kernel: [  249.196722] Buffer I/O error on device sda6, logical block 82899557
  Oct  7 20:02:12 StudioXPS kernel: [  249.196725] EXT4-fs warning (device sda6): ext4_end_bio:250: I/O error writing to inode 20709582 (offset 0 size 12288 starting block 90300262)
  Oct  7 20:02:12 StudioXPS kernel: [  249.196726] JBD2: Detected IO errors while flushing file data on sda6-8
  Oct  7 20:02:12 StudioXPS kernel: [  249.196737] sd 0:0:0:0: >[sda] Unhandled sense code
  Oct  7 20:02:12 StudioXPS kernel: [  249.196739] sd 0:0:0:0: >[sda]
  Oct  7 20:02:12 StudioXPS kernel: [  249.196740] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
  Oct  7 20:02:12 StudioXPS kernel: [  249.196742] sd 0:0:0:0: >[sda]
  Oct  7 20:02:12 StudioXPS kernel: [  249.196743] Sense Key : Medium Error [current] [descriptor]
  Oct  7 20:02:12 StudioXPS kernel: [  249.196745] Descriptor sense data with sense descriptors (in hex):
  Oct  7 20:02:12 StudioXPS kernel: [  249.196746]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
  Oct  7 20:02:12 StudioXPS kernel: [  249.196754]         1a 53 6f 98
  Oct  7 20:02:12 StudioXPS kernel: [  249.196758] sd 0:0:0:0: >[sda]
  Oct  7 20:02:12 StudioXPS kernel: [  249.196759] Add. Sense: Unrecovered read error - auto reallocate failed
  Oct  7 20:02:12 StudioXPS kernel: [  249.196761] sd 0:0:0:0: >[sda] CDB:
  Oct  7 20:02:12 StudioXPS kernel: [  249.196762] Write(10): 2a 00 0d 8b bd a0 00 00 d8 00
  Oct  7 20:02:12 StudioXPS kernel: [  249.196768] end_request: I/O error, dev sda, sector 227261856
  Oct  7 20:02:12 StudioXPS kernel: [  249.196781] ata1: EH complete
  Oct  7 20:02:12 StudioXPS kernel: [  249.196810] Aborting journal on device sda6-8.
  Oct  7 20:02:12 StudioXPS kernel: [  249.197216] EXT4-fs error (device sda6): ext4_journal_start_sb:370: Detected aborted journal
  Oct  7 20:02:12 StudioXPS kernel: [  249.197219] EXT4-fs (sda6): Remounting filesystem read-only
  Oct  7 20:02:13 StudioXPS kernel: [  250.934678] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30]
  Oct  7 20:02:13 StudioXPS kernel: [  250.934691] ecryptfs_write_end: Error encrypting page (upper index [0x0000000000000078])
  Oct  7 20:02:13 StudioXPS kernel: [  250.938886] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30]
  Oct  7 20:02:13 StudioXPS kernel: [  250.938896] ecryptfs_write_end: Error encrypting page (upper index [0x0000000000000050])
  Oct  7 20:02:13 StudioXPS kernel: [  250.939062] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30]
  Oct  7 20:02:13 StudioXPS kernel: [  250.939068] ecryptfs_writepage: Error encrypting page (upper index [0x0000000000000000])
  Oct  7 20:02:21 StudioXPS kernel: [  259.082126] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30]
  Oct  7 20:02:21 StudioXPS kernel: [  259.082138] ecryptfs_write_end: Error encrypting page (upper index [0x0000000000000005])
  Oct  7 20:02:21 StudioXPS kernel: [  259.082257] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30]
  Oct  7 20:02:21 StudioXPS kernel: [  259.082262] ecryptfs_write_end: Error encrypting page (upper index [0x0000000000000003])
  Oct  7 20:02:21 StudioXPS kernel: [  259.082376] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30]
  Oct  7 20:02:21 StudioXPS kernel: [  259.082381] ecryptfs_write_end: Error encrypting page (upper index [0x0000000000000000])
  Oct  7 20:05:16 StudioXPS kernel: [  433.841434] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30]
  Oct  7 20:05:16 StudioXPS kernel: [  433.841448] ecryptfs_write_end: Error encrypting page (upper index [0x00000000000000c9])
  Oct  7 20:07:57 StudioXPS sudo: pam_ecryptfs: pam_sm_authenticate: /home/lars is already mounted

  The harddrive is one month old and has no defects (AFAIK). The problem
  arises anywhere between directly after boot and 3h into working. A
  remount with mount -o remount,rw is not possible and aborted with an
  error. Since I will most certainly loose data during work, this
  renders my system unusable for the moment. The problem did not occur
  when running 12.04.

  ProblemType: Bug
  DistroRelease: Ubuntu 12.10
  Package: linux-image-3.5.0-17-generic 3.5.0-17.27
  ProcVersionSignature: Ubuntu 3.5.0-17.27-generic 3.5.5
  Uname: Linux 3.5.0-17-generic x86_64
  ApportVersion: 2.6.1-0ubuntu1
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC1:  lars       2341 F.... pulseaudio
   /dev/snd/controlC0:  lars       2341 F.... pulseaudio
  Date: Sun Oct  7 20:00:11 2012
  EcryptfsInUse: Yes
  InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Beta amd64 (20120926)
  MachineType: Dell Inc. Studio XPS 1640
  ProcFB: 0 radeondrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.5.0-17-generic root=UUID=68856248-4726-45a0-84b2-670a468cce31 ro quiet splash
  RelatedPackageVersions:
   linux-restricted-modules-3.5.0-17-generic N/A
   linux-backports-modules-3.5.0-17-generic  N/A
   linux-firmware                            1.94
  RfKill:
   0: phy0: Wireless LAN
    Soft blocked: no
    Hard blocked: yes
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 11/19/2009
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: A12
  dmi.board.name: 0W497D
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A12
  dmi.chassis.type: 8
  dmi.chassis.vendor: Dell Inc.
  dmi.chassis.version: A12
  dmi.modalias: dmi:bvnDellInc.:bvrA12:bd11/19/2009:svnDellInc.:pnStudioXPS1640:pvrA123:rvnDellInc.:rn0W497D:rvrA12:cvnDellInc.:ct8:cvrA12:
  dmi.product.name: Studio XPS 1640
  dmi.product.version: A123
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1063354/+subscriptions