← Back to team overview

ecryptfs team mailing list archive

[Bug 317781] Re: Ext4 data loss

 

@Brian,

We can't hold off one rename but not other file system activities.  What
you can do is simply not save files to disk in your editor, until you
are ready to save them all --- or, you can extend the commit time to
longer than 5 seconds; laptop mode extends the commit time to be 30
seconds, if I recall correctly.

In practice, note that ext3 generally ended up spinning up the disk
anyway when you saved out the file, given that (a) it would need to read
in the bitmap blocks to do the non-delayed allocation, and (b) it would
end up spinning up the disk 5-30 seconds later when the commit timer
went off.

The current set of ext4 patches queued for 2.6.29 does force the data
blocks out right away, as opposed to merely allocating the data blocks,
and not actually flushing the data blocks out until the commit.  The
reason for this was simply lack of time on my part to create a patch
that does things right, which would be a much more complicated thing to
do.  Quoting from the patch:

+	/*
+	 * We do something simple for now.  The filemap_flush() will
+	 * also start triggering a write of the data blocks, which is
+	 * not strictly speaking necessary (and for users of
+	 * laptop_mode, not even desirable).  However, to do otherwise
+	 * would require replicating code paths in:
+	 * 
+	 * ext4_da_writepages() ->
+	 *    write_cache_pages() ---> (via passed in callback function)
+	 *        __mpage_da_writepage() -->
+	 *           mpage_add_bh_to_extent()
+	 *           mpage_da_map_blocks()
+	 *
+	 * The problem is that write_cache_pages(), located in
+	 * mm/page-writeback.c, marks pages clean in preparation for
+	 * doing I/O, which is not desirable if we're not planning on
+	 * doing I/O at all.
+	 *
+	 * We could call write_cache_pages(), and then redirty all of
+	 * the pages by calling redirty_page_for_writeback() but that
+	 * would be ugly in the extreme.  So instead we would need to
+	 * replicate parts of the code in the above functions,
+	 * simplifying them becuase we wouldn't actually intend to
+	 * write out the pages, but rather only collect contiguous
+	 * logical block extents, call the multi-block allocator, and
+	 * then update the buffer heads with the block allocations.
+	 * 
+	 * For now, though, we'll cheat by calling filemap_flush(),
+	 * which will map the blocks, and start the I/O, but not
+	 * actually wait for the I/O to complete.
+	 */

It's on my todo list to get this right, but given that I was getting
enough complaints from users about losing dot files, I figured that it
was better to get the patch in.

And again, let me stress that the window was never no more than 30-60
seconds off, and people who were paranoid could always manually use the
sync command.  The fact that so many people are complaining is what
makes me deeply suspicious that there may be some faulty applications
out there which are constantly rewriting existing applications reguarly
enough that people are seeing this --- either that, or the crappy
proprietary drivers are much more crash-prone than I thought, and people
are used to Linux machines crashing all the time --- both of which are
very bad, and very unfortunate.  Hopefully neither is true, but in that
case, the chances of a file getting replaced by a zero-length file are
very small indeed.  (And again, I will note that XFS has been doing this
all along, and other newer file systems will also be doing delayed
allocation, and will be subject to the same pitfalls.   Maybe they will
also encode the same hacks to work around broken expectations, and
people with crappy proprietary binary drivers.   But folks really
shouldn't be counting on this....)

-- 
Ext4 data loss
https://bugs.launchpad.net/bugs/317781
You received this bug notification because you are a member of eCryptfs,
which is subscribed to ecryptfs-utils in ubuntu.

Status in “ecryptfs-utils” source package in Ubuntu: Invalid
Status in “linux” source package in Ubuntu: Confirmed
Status in ecryptfs-utils in Ubuntu Jaunty: Invalid
Status in linux in Ubuntu Jaunty: Confirmed

Bug description:
I recently installed Kubuntu Jaunty on a new drive, using Ext4 for all my data.

The first time i had this problem was a few days ago when after a power loss ktimetracker's config file was replaced by a 0 byte version . No idea if anything else was affected.. I just noticed ktimetracker right away.

Today, I was experimenting with some BIOS settings that made the system crash right after loading the desktop. After a clean reboot pretty much any file written to by any application (during the previous boot) was 0 bytes.
For example Plasma and some of the KDE core config files were reset. Also some of my MySQL databases were killed...

My EXT4 partitions all use the default settings with no performance tweaks. Barriers on, extents on, ordered data mode..

I used Ext3 for 2 years and I never had any problems after power losses or system crashes.

Jaunty has all the recent updates except for the kernel that i don't upgrade because of bug #315006

ProblemType: Bug
Architecture: amd64
DistroRelease: Ubuntu 9.04
NonfreeKernelModules: nvidia
Package: linux-image-2.6.28-4-generic 2.6.28-4.6
ProcCmdLine: root=UUID=81942248-db70-46ef-97df-836006aad399 ro rootfstype=ext4 vga=791 all_generic_ide elevator=anticipatory
ProcEnviron:
 LANGUAGE=
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.28-4.6-generic
SourcePackage: linux



Follow ups