← Back to team overview

ecryptfs team mailing list archive

[Bug 317781] Re: Ext4 data loss

 

@CowBoyTim

I agree with you.  I work with real-time industrial systems, where the
shop floor systems are considered unreliable.  We have all the same
issues as a regular desktop user, except our users have bigger hammers.
The attraction of ext3 was the journalling with the ordered data mode.
If power was cut, it was possible to reassemble something to a recent
point in time, with only the most recent data lost.  This bug in ext4,
results in zero-length files, and not only in the most recent files
either.

All fsync() does is bypass one layer of write-back caching.  This just
makes the window of data loss smaller, in the specific case of
infrequent fsync() calls.  By itself, fsync() does nothing to guarantee
data integrity.  I think this is why Bogdan was complaining about
defective MySQL databases.  Given the benchmarks, it is likely that the
file system zero-lengthed the entire database file.  Specifically,
fsync() guarantees the data is on the disk, it doesn't guarantee the
file system knows where the file is.  As such, one could call fsync(),
and still not be able to get at the data after a reboot.

The arguments against telling every application developer to use fsync() are:
1. Under heavy file I/O, fsync() could potentially decrease your average I/O speed by defeating the write-back caching.  This could make the window of data loss larger, especially with a real-time system where the incoming data rate is fixed.
2. Repeated calls to fsync() would be very rough on laptop mode and on SSDs (Solid State Disks).  
3. Repeated calls to fsync() will limit maximum file system performance for desktop applications.  Eventually, the file system developers will replace fsync() with an empty function, just like Apple did.
4. If everyone will want fsync(), why don't we just modify close() function to call fsync()?
5. There is a strong correlation between user activity and system crashes.  Not using the fsync() leads to much more understandable system behavior. 

Imagine a typical self-inflicted system crash.  This can be caused
either directly: "Press Save then turn off the Computer," or indirectly:
"edit video game config, hit play, and then watch the video driver
crash."

If the write-back cache is enabled, and fsync() is not used, the program
will write data to the cache, cause a bunch of disk reads, and then
during idle time, the data will be written to disk.  If the user
generated activity results in disk reads, then the write-back cache will
"protect" the old version of the file.  The user will learn that
crashing the machine results in him losing his most recent changes.

On the other hand, if fsync() is used to disable the write back cache,
then programmers will start calling fsync() and close() from background
threads.  This will result in a poor user experience, as the hard disk
will be thrashing during program startup (when all the disk reads are
happening), and anything could happen when the system crashes during the
fsync().

In the case that system crashes correlate to user activity, it is really
tempting from a software point of view, to try to get the fsync() to
happen before the system crash occurs.  Unfortunately, in practice this
is really tough to do.  The journaled file system with an ordered data
mode is a really good compromise for many desktop and real-time type
applications.  Additionally, limited fsync() use preserves the
effectiveness of fsync() for applications that really need it, like
databases.

-- 
Ext4 data loss
https://bugs.launchpad.net/bugs/317781
You received this bug notification because you are a member of eCryptfs,
which is subscribed to ecryptfs-utils in ubuntu.

Status in “ecryptfs-utils” source package in Ubuntu: Invalid
Status in “linux” source package in Ubuntu: Fix Committed
Status in ecryptfs-utils in Ubuntu Jaunty: Invalid
Status in linux in Ubuntu Jaunty: Fix Committed

Bug description:
I recently installed Kubuntu Jaunty on a new drive, using Ext4 for all my data.

The first time i had this problem was a few days ago when after a power loss ktimetracker's config file was replaced by a 0 byte version . No idea if anything else was affected.. I just noticed ktimetracker right away.

Today, I was experimenting with some BIOS settings that made the system crash right after loading the desktop. After a clean reboot pretty much any file written to by any application (during the previous boot) was 0 bytes.
For example Plasma and some of the KDE core config files were reset. Also some of my MySQL databases were killed...

My EXT4 partitions all use the default settings with no performance tweaks. Barriers on, extents on, ordered data mode..

I used Ext3 for 2 years and I never had any problems after power losses or system crashes.

Jaunty has all the recent updates except for the kernel that i don't upgrade because of bug #315006

ProblemType: Bug
Architecture: amd64
DistroRelease: Ubuntu 9.04
NonfreeKernelModules: nvidia
Package: linux-image-2.6.28-4-generic 2.6.28-4.6
ProcCmdLine: root=UUID=81942248-db70-46ef-97df-836006aad399 ro rootfstype=ext4 vga=791 all_generic_ide elevator=anticipatory
ProcEnviron:
 LANGUAGE=
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.28-4.6-generic
SourcePackage: linux