kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #72944
[Bug 1063354] Re: [Dell Studio XPS 1640] Sudden Read-Only Filesystems
I have an old tower which I use to test multiple operating systems.
Each OS lives on a separate drive in a removable tray, so the drives can
be swapped as needed. Once in a while the system would hang when the
BIOS was set to auto-detect the drives at every boot, or I would see an
occasional failure to mount the ATA boot device when Linux was started
in verbose mode--and Windows would simply freeze randomly. The problem
was traced to the power connector on a drive tray: I had to extract the
pins from the connector with a special tool, cut off the wires, soak the
pins in contact cleaner, and solder them back on, because the crimped
connection and the corrosion made it unreliable.
http://en.wikipedia.org/wiki/Molex_connector#Disk_drive_connector_.28AMP_MATE-N-LOK_1-480424-0_Power_Connector.29
http://www.molex.com/molex/products/family?key=disk_drive_power_connector&channel=PRODUCTS&chanName=family&pageTitle=Introduction
I never had a problem with these connectors before, except for the ones
in the Enermax trays (which seem to be made of the cheapest materials
they could find.) Before I repaired the power connector, I encountered
that read-only bug in Ubuntu. When this occurred, ALL physical volumes
attached to the machine became read-only, including other hard drives
and all external USB storage devices. Even new USB devices attached
later were not writable. The only thing I could write to was a network
share. If this happens on all affected platforms, it might give
developers some idea of what to look for in the source code. I also
wonder if some power management feature could be involved:
GRUB_CMDLINE_LINUX="libata.dma=0 libata.noacpi=1"
http://ubuntuforums.org/showthread.php?t=1892483
I believe this bug can be triggered by other things too, such as system
BIOS bug or AHCI preference, drive firmware bug, defective electrolytic
capacitors on a old mainboard, bad solder joints just about anywhere, a
defective (or overloaded) power supply. But in the case of SSD drives
it could also be a latency issue:
Why Solid-State Drives Slow Down As You Fill Them Up (Ubuntu should warn about this)
"When filling up an empty drive, they found high write performance very early in the process and a significant drop as the write operations continued to fill up the drive... If you have a solid-state drive, you should try to avoid using more than 75% of its capacity."
http://www.howtogeek.com/165542/why-solid-state-drives-slow-down-as-you-fill-them-up/
(for general reference on dual-boot systems):
12 Things You Must Do When Running a Solid State Drive in Windows 7
http://www.maketecheasier.com/12-things-you-must-do-when-running-a-solid-state-drive-in-windows-7/
I suspect that people who experience read-only issues today were
experiencing silent write retries in previous kernel versions and simply
did not notice because the retry was successful. It seems like the
common thread is that the drive was not ready to accept writes for some
reason, and the kernel did not detect this condition. I tried to
simulate this by removing power to the drive momentarily. During this
time, CPU usage was very high, but it returned to normal when power was
applied, and the read-only bug was not triggered.
On various other platforms I have seen S.M.A.R.T. drives which are NOT
defective logging an "Interface CRC error" when a 'READ DMA EXT' command
was issued, due to a cable or connector fault. When the drive was moved
to another system, the errors stopped. So the drive is not necessarily
failing just because you see the error count going up.
I think that a S.M.A.R.T. status monitor should be included with the
base installation: the S.M.A.R.T. feature is not only useful to diagnose
faults within the drive, it sometimes permits you to infer something
about the quality of the power & data connection over time. If you can
consistently correlate some particular S.M.A.R.T. error code with the
behavior that causes the volume to turn read-only, then you may have
found a way to distinguish a cable fault from a kernel or firmware bug,
and the OS could use it to generate more helpful error messages. So it
might be good to report which (if any) of the drives S.M.A.R.T. counters
were incremented when you experience that read-only problem.
I am not too familiar with the specifications, but developers might also
want to investigate the possibility of using the System Management bus
or Power Management bus to assist in characterizing these failures if
the platform collects any useful information. For those who solved the
problem by disabling NCQ: there was an NCQ drive blacklist for the Linux
kernel until (I believe) 2.6.24. This implies some incompatibility with
particular models.
"there are drives with firmware bugs that deliberately lie about when data has been physically written."
http://serverfault.com/questions/460864/safety-of-write-cache-on-sata-drives-with-barriers
_____
"One little-known feature of NCQ is that the host can specify whether it
wants to be notified of completion when the data hits the disk's
platters or when it hits the disk's buffer (on-board cache)." (Does the
kernel do this correctly?)
"NCQ can negatively interfere with the operating system's I/O scheduler,
actually decreasing performance; this has been observed in practice on
Linux with RAID-5. There is no mechanism in NCQ for the host to specify
any sort of deadlines for an I/O, like how many times a request can be
ignored in favor of others. In theory, a NCQ-ed request can be delayed
by the drive an arbitrary amount of time while it is serving other
(possibly new) requests under I/O pressure. Since the algorithms used
inside drive firmware for NCQ dispatch ordering are generally not
publicly known, this introduces another level of uncertainty for
hardware/firmware performance. Tests at Google around 2008 have shown
that NCQ can delay an I/O for up to 1-2 seconds."
http://en.wikipedia.org/wiki/Native_Command_Queuing
_____
Test if NCQ is enabled: dmesg | grep -i ncq
Write-protect & cache status: dmesg | grep sda
_____
Operational theory / Educational resources:
Modern disk write caches and how they get dealt with
http://utcc.utoronto.ca/~cks/space/blog/tech/ModernDiskWriteCaches
How to force a disk write cache flush operation on Linux
http://utcc.utoronto.ca/~cks/space/blog/linux/ForceDiskFlushes
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1063354
Title:
[Dell Studio XPS 1640] Sudden Read-Only Filesystems
Status in “linux” package in Ubuntu:
Incomplete
Bug description:
After upgrading to ubuntu 12.10, I experience sudden locks of my
filesystems (I have a root and a home partition with ext4), in which
the filesystems suddenly become mounted readonly. /var/log/syslog
shows the following entries:
Oct 7 20:00:42 StudioXPS signond[3510]: signondaemon.cpp 345 init Failed to SUID root. Secure storage will not be available.
Oct 7 20:02:12 StudioXPS kernel: [ 249.193555] ata1.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x0
Oct 7 20:02:12 StudioXPS kernel: [ 249.193561] ata1.00: irq_stat 0x40000001
Oct 7 20:02:12 StudioXPS kernel: [ 249.193565] ata1.00: failed command: READ FPDMA QUEUED
Oct 7 20:02:12 StudioXPS kernel: [ 249.193572] ata1.00: cmd 60/20:00:90:6f:53/00:00:1a:00:00/40 tag 0 ncq 16384 in
Oct 7 20:02:12 StudioXPS kernel: [ 249.193572] res 41/40:20:98:6f:53/00:00:1a:00:00/40 Emask 0x409 (media error) <F>
Oct 7 20:02:12 StudioXPS kernel: [ 249.193575] ata1.00: status: { DRDY ERR }
Oct 7 20:02:12 StudioXPS kernel: [ 249.193578] ata1.00: error: { UNC }
Oct 7 20:02:12 StudioXPS kernel: [ 249.193581] ata1.00: failed command: WRITE FPDMA QUEUED
Oct 7 20:02:12 StudioXPS kernel: [ 249.193587] ata1.00: cmd 61/18:08:18:fb:0e/00:00:2b:00:00/40 tag 1 ncq 12288 out
Oct 7 20:02:12 StudioXPS kernel: [ 249.193587] res 41/40:08:98:6f:53/00:00:1a:00:00/40 Emask 0x9 (media error)
Oct 7 20:02:12 StudioXPS kernel: [ 249.193590] ata1.00: status: { DRDY ERR }
Oct 7 20:02:12 StudioXPS kernel: [ 249.193593] ata1.00: error: { UNC }
Oct 7 20:02:12 StudioXPS kernel: [ 249.193596] ata1.00: failed command: WRITE FPDMA QUEUED
Oct 7 20:02:12 StudioXPS kernel: [ 249.193602] ata1.00: cmd 61/d8:10:a0:bd:8b/00:00:0d:00:00/40 tag 2 ncq 110592 out
Oct 7 20:02:12 StudioXPS kernel: [ 249.193602] res 41/40:08:98:6f:53/00:00:1a:00:00/40 Emask 0x9 (media error)
Oct 7 20:02:12 StudioXPS kernel: [ 249.193605] ata1.00: status: { DRDY ERR }
Oct 7 20:02:12 StudioXPS kernel: [ 249.193607] ata1.00: error: { UNC }
Oct 7 20:02:12 StudioXPS kernel: [ 249.196606] ata1.00: configured for UDMA/100
Oct 7 20:02:12 StudioXPS kernel: [ 249.196622] sd 0:0:0:0: >[sda] Unhandled sense code
Oct 7 20:02:12 StudioXPS kernel: [ 249.196624] sd 0:0:0:0: >[sda]
Oct 7 20:02:12 StudioXPS kernel: [ 249.196626] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 7 20:02:12 StudioXPS kernel: [ 249.196628] sd 0:0:0:0: >[sda]
Oct 7 20:02:12 StudioXPS kernel: [ 249.196629] Sense Key : Medium Error [current] [descriptor]
Oct 7 20:02:12 StudioXPS kernel: [ 249.196633] Descriptor sense data with sense descriptors (in hex):
Oct 7 20:02:12 StudioXPS kernel: [ 249.196634] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Oct 7 20:02:12 StudioXPS kernel: [ 249.196642] 1a 53 6f 98
Oct 7 20:02:12 StudioXPS kernel: [ 249.196645] sd 0:0:0:0: >[sda]
Oct 7 20:02:12 StudioXPS kernel: [ 249.196648] Add. Sense: Unrecovered read error - auto reallocate failed
Oct 7 20:02:12 StudioXPS kernel: [ 249.196650] sd 0:0:0:0: >[sda] CDB:
Oct 7 20:02:12 StudioXPS kernel: [ 249.196651] Read(10): 28 00 1a 53 6f 90 00 00 20 00
Oct 7 20:02:12 StudioXPS kernel: [ 249.196658] end_request: I/O error, dev sda, sector 441675672
Oct 7 20:02:12 StudioXPS kernel: [ 249.196674] sd 0:0:0:0: >[sda] Unhandled sense code
Oct 7 20:02:12 StudioXPS kernel: [ 249.196676] sd 0:0:0:0: >[sda]
Oct 7 20:02:12 StudioXPS kernel: [ 249.196678] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 7 20:02:12 StudioXPS kernel: [ 249.196679] sd 0:0:0:0: >[sda]
Oct 7 20:02:12 StudioXPS kernel: [ 249.196681] Sense Key : Medium Error [current] [descriptor]
Oct 7 20:02:12 StudioXPS kernel: [ 249.196683] Descriptor sense data with sense descriptors (in hex):
Oct 7 20:02:12 StudioXPS kernel: [ 249.196684] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Oct 7 20:02:12 StudioXPS kernel: [ 249.196692] 1a 53 6f 98
Oct 7 20:02:12 StudioXPS kernel: [ 249.196695] sd 0:0:0:0: >[sda]
Oct 7 20:02:12 StudioXPS kernel: [ 249.196697] Add. Sense: Unrecovered read error - auto reallocate failed
Oct 7 20:02:12 StudioXPS kernel: [ 249.196699] sd 0:0:0:0: >[sda] CDB:
Oct 7 20:02:12 StudioXPS kernel: [ 249.196700] Write(10): 2a 00 2b 0e fb 18 00 00 18 00
Oct 7 20:02:12 StudioXPS kernel: [ 249.196706] end_request: I/O error, dev sda, sector 722402072
Oct 7 20:02:12 StudioXPS kernel: [ 249.196710] Buffer I/O error on device sda6, logical block 82899555
Oct 7 20:02:12 StudioXPS kernel: [ 249.196718] Buffer I/O error on device sda6, logical block 82899556
Oct 7 20:02:12 StudioXPS kernel: [ 249.196722] Buffer I/O error on device sda6, logical block 82899557
Oct 7 20:02:12 StudioXPS kernel: [ 249.196725] EXT4-fs warning (device sda6): ext4_end_bio:250: I/O error writing to inode 20709582 (offset 0 size 12288 starting block 90300262)
Oct 7 20:02:12 StudioXPS kernel: [ 249.196726] JBD2: Detected IO errors while flushing file data on sda6-8
Oct 7 20:02:12 StudioXPS kernel: [ 249.196737] sd 0:0:0:0: >[sda] Unhandled sense code
Oct 7 20:02:12 StudioXPS kernel: [ 249.196739] sd 0:0:0:0: >[sda]
Oct 7 20:02:12 StudioXPS kernel: [ 249.196740] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 7 20:02:12 StudioXPS kernel: [ 249.196742] sd 0:0:0:0: >[sda]
Oct 7 20:02:12 StudioXPS kernel: [ 249.196743] Sense Key : Medium Error [current] [descriptor]
Oct 7 20:02:12 StudioXPS kernel: [ 249.196745] Descriptor sense data with sense descriptors (in hex):
Oct 7 20:02:12 StudioXPS kernel: [ 249.196746] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Oct 7 20:02:12 StudioXPS kernel: [ 249.196754] 1a 53 6f 98
Oct 7 20:02:12 StudioXPS kernel: [ 249.196758] sd 0:0:0:0: >[sda]
Oct 7 20:02:12 StudioXPS kernel: [ 249.196759] Add. Sense: Unrecovered read error - auto reallocate failed
Oct 7 20:02:12 StudioXPS kernel: [ 249.196761] sd 0:0:0:0: >[sda] CDB:
Oct 7 20:02:12 StudioXPS kernel: [ 249.196762] Write(10): 2a 00 0d 8b bd a0 00 00 d8 00
Oct 7 20:02:12 StudioXPS kernel: [ 249.196768] end_request: I/O error, dev sda, sector 227261856
Oct 7 20:02:12 StudioXPS kernel: [ 249.196781] ata1: EH complete
Oct 7 20:02:12 StudioXPS kernel: [ 249.196810] Aborting journal on device sda6-8.
Oct 7 20:02:12 StudioXPS kernel: [ 249.197216] EXT4-fs error (device sda6): ext4_journal_start_sb:370: Detected aborted journal
Oct 7 20:02:12 StudioXPS kernel: [ 249.197219] EXT4-fs (sda6): Remounting filesystem read-only
Oct 7 20:02:13 StudioXPS kernel: [ 250.934678] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30]
Oct 7 20:02:13 StudioXPS kernel: [ 250.934691] ecryptfs_write_end: Error encrypting page (upper index [0x0000000000000078])
Oct 7 20:02:13 StudioXPS kernel: [ 250.938886] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30]
Oct 7 20:02:13 StudioXPS kernel: [ 250.938896] ecryptfs_write_end: Error encrypting page (upper index [0x0000000000000050])
Oct 7 20:02:13 StudioXPS kernel: [ 250.939062] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30]
Oct 7 20:02:13 StudioXPS kernel: [ 250.939068] ecryptfs_writepage: Error encrypting page (upper index [0x0000000000000000])
Oct 7 20:02:21 StudioXPS kernel: [ 259.082126] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30]
Oct 7 20:02:21 StudioXPS kernel: [ 259.082138] ecryptfs_write_end: Error encrypting page (upper index [0x0000000000000005])
Oct 7 20:02:21 StudioXPS kernel: [ 259.082257] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30]
Oct 7 20:02:21 StudioXPS kernel: [ 259.082262] ecryptfs_write_end: Error encrypting page (upper index [0x0000000000000003])
Oct 7 20:02:21 StudioXPS kernel: [ 259.082376] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30]
Oct 7 20:02:21 StudioXPS kernel: [ 259.082381] ecryptfs_write_end: Error encrypting page (upper index [0x0000000000000000])
Oct 7 20:05:16 StudioXPS kernel: [ 433.841434] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30]
Oct 7 20:05:16 StudioXPS kernel: [ 433.841448] ecryptfs_write_end: Error encrypting page (upper index [0x00000000000000c9])
Oct 7 20:07:57 StudioXPS sudo: pam_ecryptfs: pam_sm_authenticate: /home/lars is already mounted
The harddrive is one month old and has no defects (AFAIK). The problem
arises anywhere between directly after boot and 3h into working. A
remount with mount -o remount,rw is not possible and aborted with an
error. Since I will most certainly loose data during work, this
renders my system unusable for the moment. The problem did not occur
when running 12.04.
ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: linux-image-3.5.0-17-generic 3.5.0-17.27
ProcVersionSignature: Ubuntu 3.5.0-17.27-generic 3.5.5
Uname: Linux 3.5.0-17-generic x86_64
ApportVersion: 2.6.1-0ubuntu1
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC1: lars 2341 F.... pulseaudio
/dev/snd/controlC0: lars 2341 F.... pulseaudio
Date: Sun Oct 7 20:00:11 2012
EcryptfsInUse: Yes
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Beta amd64 (20120926)
MachineType: Dell Inc. Studio XPS 1640
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.5.0-17-generic root=UUID=68856248-4726-45a0-84b2-670a468cce31 ro quiet splash
RelatedPackageVersions:
linux-restricted-modules-3.5.0-17-generic N/A
linux-backports-modules-3.5.0-17-generic N/A
linux-firmware 1.94
RfKill:
0: phy0: Wireless LAN
Soft blocked: no
Hard blocked: yes
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 11/19/2009
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A12
dmi.board.name: 0W497D
dmi.board.vendor: Dell Inc.
dmi.board.version: A12
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.chassis.version: A12
dmi.modalias: dmi:bvnDellInc.:bvrA12:bd11/19/2009:svnDellInc.:pnStudioXPS1640:pvrA123:rvnDellInc.:rn0W497D:rvrA12:cvnDellInc.:ct8:cvrA12:
dmi.product.name: Studio XPS 1640
dmi.product.version: A123
dmi.sys.vendor: Dell Inc.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1063354/+subscriptions