← Back to team overview

ecryptfs team mailing list archive

[Bug 317781] Re: Ext4 data loss

 

ted ts'o:

"You can opine all you want, but the problem is that POSIX does not
specify anything ..."

I'll opine that POSIX needs to be updated.

The use of the create-new-file-write-rename design pattern is pervasive
and expected that after a crash either the new contents or the old
contents of the file will be found there, but zero length is
unacceptable.  This is the behavior that we saw with ext2 where the
metadata and data writes could get re-ordered and result in zero-length
files.  With the 800 servers that I was maintaining then, it meant that
the perl scripts for our account management software would zero-length
out /etc/passwd, along with other corruption often enough that we were
rebuilding servers every week or two.  As the site grew and roles and
responsibilites grew that meant that with 30,000 linux boxes, even with
1,000-day uptimes there were 30 server crashes per day ( even without
crappy graphics drivers, a linux server busy doing apache and a bunch of
mixed network/cpu/disk-io seems to have about this average uptime -- i'm
not unhappy with this, but at large numbers of servers, then server
crashes catch up with you ).  And while I've never seen this result in
data loss, it does result in churn in rebuilding and reimaging servers.
It could also cause issues where a server is placed back into rotation
looking like it is working (nothing so obvious as /etc/passwd
corrupted), but is still failing on something critical after a reboot.
You can jump through intellectual hoops about how servers shouldn't be
put back into rotation without validation, but even at the small site
that I'm at now with 2,000 servers and about 300 different kinds of
servers, we don't have good validation, don't have the resources to
build it, and rely on servers being able to be put back into rotation
after they reboot without worrying about subtle corruption issues.

There is now an expectation that filesystems have transactional
behavior.  Deal with it.  If it isn't explicitly part of POSIX then
POSIX needs to be updated in order to reflect the actual realities of
how people are using Unix-like systems these days -- POSIX was not
handed down from God to Linus on the Mount.  It can and should be
amended.  And this should not damage the performance benefits of doing
delayed writes.  Just because you have to be consistent doesn't mean
that you have to start doing fsync()s for me all the time.  If I don't
explictly call fsync()/fdatasync() you can hold the writes in memory for
30 minutes and abusively punish me for not doing that explicitly myself.
But just delay *both* the data and metadata writes so that I either get
the full "transaction" or I don't.  And stop whining about how people
don't know how to use your precious filesystem.

-- 
Ext4 data loss
https://bugs.launchpad.net/bugs/317781
You received this bug notification because you are a member of eCryptfs,
which is subscribed to ecryptfs-utils in ubuntu.

Status in “ecryptfs-utils” source package in Ubuntu: Invalid
Status in “linux” source package in Ubuntu: Fix Released
Status in ecryptfs-utils in Ubuntu Jaunty: Invalid
Status in linux in Ubuntu Jaunty: Fix Released

Bug description:
I recently installed Kubuntu Jaunty on a new drive, using Ext4 for all my data.

The first time i had this problem was a few days ago when after a power loss ktimetracker's config file was replaced by a 0 byte version . No idea if anything else was affected.. I just noticed ktimetracker right away.

Today, I was experimenting with some BIOS settings that made the system crash right after loading the desktop. After a clean reboot pretty much any file written to by any application (during the previous boot) was 0 bytes.
For example Plasma and some of the KDE core config files were reset. Also some of my MySQL databases were killed...

My EXT4 partitions all use the default settings with no performance tweaks. Barriers on, extents on, ordered data mode..

I used Ext3 for 2 years and I never had any problems after power losses or system crashes.

Jaunty has all the recent updates except for the kernel that i don't upgrade because of bug #315006

ProblemType: Bug
Architecture: amd64
DistroRelease: Ubuntu 9.04
NonfreeKernelModules: nvidia
Package: linux-image-2.6.28-4-generic 2.6.28-4.6
ProcCmdLine: root=UUID=81942248-db70-46ef-97df-836006aad399 ro rootfstype=ext4 vga=791 all_generic_ide elevator=anticipatory
ProcEnviron:
 LANGUAGE=
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.28-4.6-generic
SourcePackage: linux