kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #00693
Re: [Bug 1202994] Re: EXT4 filesystem corruption with uninit_bg and error=continue
On 19/07/13 18:34, Joseph Salisbury wrote:
> The commit(b0dd6b7) you mention in the upstrem bug report is in the 3.2 stable tree as commit 76f4fa4:
> * 76f4fa4 - ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bg (1 year, 1 month ago) <Theodore Ts'o>
>
> It was available as of 3.2.20 as you say:
> git describe --contains 76f4fa4
> v3.2.20~1
>
> This means that patch is in the 3.2.0-49 Ubuntu kernel, since it
> contains all the upstream 3.2.46 updates.
>
> The patch from Darrick J Wong that you mention is still being discuss on the linux-ext4 mailing list and is not yet available in the mainline kernel tree:
> ext4: Prevent massive fs corruption if verifying the block bitmap fails
>
> Do you have a way to easily reproduce this bug? If so, I can build a
> test kernel with Darrick's patch for you to test.
'Fraid not -- it's a one-off event (I hope!).
The filesystem in question (/export/share - mostly used for backups of
other machines and ISO boot images) had originally been created on a
logical volume of ~640Gb in a volume group of just under 1Tb on a single
PV composed of a RAID10 array of two 1Tb partitions, one on each of two
2Gb SATA disks. *At some later time* this LV was expanded to use the
rest of the free space in that volume group, making it 800Gb, and *the
filesystem was resized *to match*-- this may have been a contributing
factor.*
This week, because the FS was getting quite full (about ~97% or *~30Gb
left, i.e. within the last ~40G **r**eserved for root - could this be
part of the trigger?*), I decided to install two spare disks so that I
could migrate this VG onto them. This involved a power cycle, reboot,
and lots of playing around with mdadm -- but I don't think any of this
was significant.
After reboot, I had all 4 disks accessible, with no errors. One of the
new disks was virgin, and I had created a new RAID10 mirror using it:
# mdadm --create /dev/md/scratch --bitmap=internal --level=10
--parity=f2 --raid-devices=2 --name=new missing /dev/sdd1
The other was recycled from another machine, and already had MD/LVM
volumes on it, which were correctly recognised as "foreign"
arrays/volumes. I mounted the one that still contained the system image
from the other machine and copied it into a subdirectory of
/export/share (specifically, Backups/Galaxy/suse-11.4/ -- see below)
using rsync -- *about 15Gb of data, using up about half the remaining
(reserved) space. **This was the last write operation on the FS*. (I ran
rsync again immediately afterwards, to verify that all files had been
transferred with no errors. and all seemed OK. Nonetheless, *I think
this is where the corruption occurred*.)
Then I dismantled the foreign LV/MD stack, wiped that disk, and made it
part of the new RAID10 array, triggering a resync. Then I added the new
array to the existing VG and migrated the LVs in it to the new array
using pvmove.
The pvmove completed without errors, so I then removed the original
array from the VG. (The raid remirroring completed without errors too,
but I'm not sure when, probably later). Now that the VG was on a bigger
disk, I decided to expand each of the LVs on it. Then when I tried to
resize /export/share to use the expanded space, I was told I should run
e2fsck first - which reported many errors, starting with:
e2fsck 1.42 (29-Nov-2011)
e2fsck: Group descriptors look bad... trying backup blocks...
One or more block group descriptor checksums are invalid. Fix<y>? yes
Group descriptor 0 checksum is invalid. FIXED.
Group descriptor 1 checksum is invalid. FIXED.
Group descriptor 2 checksum is invalid. FIXED.
Group descriptor 3 checksum is invalid. FIXED.
... etc etc ...
Group descriptor 6397 checksum is invalid. FIXED.
Group descriptor 6398 checksum is invalid. FIXED.
Group descriptor 6399 checksum is invalid. FIXED.
Pass 1: Checking inodes, blocks, and sizes
Group 2968's block bitmap at 97248129 conflicts with some other fs block.
Relocate<y>? yes
Relocating group 2968's block bitmap from 97248129 to 96998147...
Running additional passes to resolve blocks claimed by more than one inode...
Pass 1B: Rescanning for multiply-claimed blocks
Multiply-claimed block(s) in inode
97255619 97255620 97255621 97255622 97255623 97255624 97255625 97255626 97255627 97255628 97255629 97255630 97255631 97255632 97255633 97255634 97255635 97255636 97255637 97255638 97255639 97255640 97255641 97255642 97255643 97255644 97255645 97255646
... etc etc ...
Multiply-claimed block(s) in inode 24270904: 97263482 97263483
Multiply-claimed block(s) in inode 24270909: 97263574 97263575
Multiply-claimed block(s) in inode 24270931: 97263606 97263607
Pass 1C: Scanning directories for inodes with multiply-claimed blocks
Pass 1D: Reconciling multiply-claimed blocks
(There are 1334 inodes containing multiply-claimed blocks.)
File /Backups/Tesseract/DrivingLicenceReverse_300dpi.bmp (inode #24248332, mod time Thu Mar 25 01:34:37 2010)
has 136 multiply-claimed block(s), shared with 7 file(s):
/Backups/Galaxy/suse-11.4/bin/bash (inode #24269252, mod time Thu Jul 12 20:04:07 2012)
/Backups/Galaxy/suse-11.4/bin/basename (inode #24269251, mod time Wed Sep 21 16:30:45 2011)
/Backups/Galaxy/suse-11.4/bin/arch (inode #24269250, mod time Wed Sep 21 16:30:45 2011)
/Backups/Galaxy/suse-11.4/.local/share/applications/defaults.list (inode #24269249, mod time Mon Sep 12 19:44:00 2011)
/Backups/Galaxy/suse-11.4/.config/Trolltech.conf (inode #24269248, mod time Wed Oct 26 13:59:14 2011)
/Backups/Galaxy/suse-11.4/profilerc (inode #24269247, mod time Mon Sep 12 19:44:00 2011)
/Backups/Galaxy/suse-11.4/C:\nppdf32Log\debuglog.txt (inode #24269246, mod time Sun Sep 9 14:37:47 2012)
Clone multiply-claimed blocks<y>? yes
File /Backups/Tesseract/wla_user_guide.pdf (inode #24248352, mod time Thu Nov 13 12:18:26 2003)
has 1310 multiply-claimed block(s), shared with 107 file(s):
/Backups/Galaxy/suse-11.4/bin/tcsh (inode #24269354, mod time Sat Feb 19 02:49:24 2011)
/Backups/Galaxy/suse-11.4/bin/tar (inode #24269353, mod time Tue Jan 3 00:33:47 2012)
/Backups/Galaxy/suse-11.4/bin/sync (inode #24269352, mod time Wed Sep 21 16:30:49 2011)
/Backups/Galaxy/suse-11.4/bin/su (inode #24269351, mod time Wed Sep 21 16:30:49 2011)
/Backups/Galaxy/suse-11.4/bin/stty (inode #24269350, mod time Wed Sep 21 16:30:48 2011)
/Backups/Galaxy/suse-11.4/bin/stat (inode #24269349, mod time Wed Sep 21 16:30:48 2011)
/Backups/Galaxy/suse-11.4/bin/spawn_login (inode #24269348, mod time Sat Feb 19 02:46:10 2011)
/Backups/Galaxy/suse-11.4/bin/spawn_console (inode #24269347, mod time Sat Feb 19 02:46:10 2011)
... etc etc ...
On examining the contents of these files, it became evident that in each
case the newly copied files in Backups/Galaxy/suse-11.4/ were correct,
while the named files in Backups/Tesseract/... were corrupted. Hence my
conclusion that some of the blocks already allocated to the latter were
erroneously taken to be free and used for the new files copied in by rsync.
...
File /Backups/Galaxy/suse-11.4/etc/gconf/gconf.xml.schemas/%gconf-tree-oc.xml (inode #24270909, mod time Sun Aug 14 21:50:15 2011)
has 2 multiply-claimed block(s), shared with 2 file(s):
<filesystem metadata>
/Backups/Tesseract/Audio/Jack Ruston & Mark Edwards/The Man in the Picture, by Susan Hill (CD 1 of 3)/06__Chapter 5.ogg (inode #24248358, mod time Fri Feb 4 22:53:03 2011)
Multiply-claimed blocks already reassigned or cloned.
File /Backups/Galaxy/suse-11.4/etc/gconf/gconf.xml.schemas/%gconf-tree-wa.xml (inode #24270931, mod time Sun Aug 14 21:50:20 2011)
has 2 multiply-claimed block(s), shared with 2 file(s):
<filesystem metadata>
/Backups/Tesseract/Audio/Jack Ruston & Mark Edwards/The Man in the Picture, by Susan Hill (CD 1 of 3)/06__Chapter 5.ogg (inode #24248358, mod time Fri Feb 4 22:53:03 2011)
Multiply-claimed blocks already reassigned or cloned.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: +96998147
Fix<y>? yes
Free blocks count wrong for group #1133 (0, counted=156).
Fix<y>? yes
Free blocks count wrong for group #1134 (0, counted=943).
Fix<y>? yes
... etc etc ...
Free blocks count wrong for group #6019 (32768, counted=0).
Fix<y>? yes
Free blocks count wrong for group #6020 (32768, counted=0).
Fix<y>? yes
...
Directories count wrong for group #4465 (0, counted=29).
Fix<y>? yes
Free inodes count wrong (52421173, counted=51433277).
Fix<y>? yes
share: ***** FILE SYSTEM WAS MODIFIED *****
995523 inodes used (1.90%)
1231 non-contiguous files (0.1%)
980 non-contiguous directories (0.1%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 955338/210/3
195882827 blocks used (93.40%)
0 bad blocks
38 large files
859488 regular files
90714 directories
94 character device files
64 block device files
16 fifos
79548 links
44961 symbolic links (39613 fast symbolic links)
177 sockets
--------
1075062 files
Because I suspected the FS might have been corrupted by pvmove shuffling
its data between volumes (or even by the md remirroring process going on
underneath that!), I put the old PV that I had recently removed from the
VG into a new VG of its own, and used lvcreate/lvextend to resurrect the
original copy of the FS:
# lvcreate --verbose --name replay --extents 171751 --zero n test_vg /dev/md126:65536-
# lvextend --verbose --extents 204800 /dev/test_vg/replay /dev/md126:30720-63768
Running
# e2fsck -f -n /dev/test_vg/replay
showed exactly the same corruption. Thus it seems that the FS was
already damaged before it was mirrored onto the new volume, which is why
I suspect the problem lies in EXT4 rather than LVM or md.
Here's the output of dumpe2fs -h as it was after the corruption but
before letting e2fsck fix it:
Filesystem volume name: share
Last mounted on: /export/share
Filesystem UUID: 80477518-0fea-447a-bece-f77fe26193bb
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean with errors
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 52428800
Block count: 209715200
Reserved block count: 10484660
Free blocks: 13897914
Free inodes: 51433277
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 974
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
RAID stride: 128
RAID stripe width: 256
Flex block group size: 16
Filesystem created: Wed Feb 6 15:50:31 2013
Last mount time: Mon Jul 15 17:51:37 2013
Last write time: Mon Jul 15 18:01:03 2013
Mount count: 24
Maximum mount count: -1
Last checked: Thu Feb 7 18:33:49 2013
Check interval: 0 (<none>)
Lifetime writes: 480 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 5ff8295f-3988-40e0-b195-998d6e67aa31
Journal backup: inode blocks
FS Error count: 1
First error time: Mon Jul 15 18:01:03 2013
First error function: ext4_mb_generate_buddy
First error line #: 739
First error inode #: 0
First error block #: 0
Last error time: Mon Jul 15 18:01:03 2013
Last error function: ext4_mb_generate_buddy
Last error line #: 739
Last error inode #: 0
Last error block #: 0
Journal features: journal_incompat_revoke
Journal size: 128M
Journal length: 32768
Journal sequence: 0x0000645d
Journal start: 0
As it happens, only 13 existing files (containing a total of 65Mb of data between them) were damaged,
and they were mostly large but ancient and not very important content backed up from other machines.
So I've had something of a lucky escape; and I've subsequently changed all live volumes to use
errors=remount-ro rather than errors=continue, which I had never realised was the default!
I can provide any information you'd like about the corrupted FS, as I've preserved it in that state since
(modulo anything that might have been changed by mounting it read-only). But I don't have any way of finding
out what the internal state was when it was last mounted or immediately before the corruption occurred.
Hope this helps -- and let me know if there's anything you'd like me to
extract from the corrupted FS.
Ciao,
Dave
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1202994
Title:
EXT4 filesystem corruption with uninit_bg and error=continue
Status in “linux” package in Ubuntu:
Confirmed
Bug description:
There was a long and complicated sequence of activities involving
mdadm, lvm, and specifically pvmove leading up to the point where the
corruption was discovered, but I suspect most were irrelevant. AFAICT,
the bug was triggered by the following simple operations:
* the FS was unmounted & remounted -- thus, the journal was fresh and hadn't wrapped (which other reports appear to indicate would have prevented the bug showing up)
* the FS options include uninit_bg AND error=continue
* a bunch of files were then copied onto the FS -- this was the last write operation on the FS.
Later, e2fsck indicated a bunch of problems, including corrupted group
descriptors. Specifically, it fould that many blocks were now claimed
by two files; in each case, one was an old file and one was one of
those newly copied, and the contents matched the expected data for
latter.
So I think this starts with an instance of the miscalculation of
checksums in uninit_bg blocks (fixed by Ted Ts'o last June), followed
by the (invalid or uninitialised) bitmap being used anyway (because
error=continue) and the blocks it appeared to show as free then being
allocated to new files.
Jul 15 18:01:03 redshift kernel: [ 9332.021245] EXT4-fs error (device dm-1): ext4_mb_generate_buddy:739: group 2968, 8105 clusters in bitmap, 0 in gd
...
Jul 16 18:05:14 redshift kernel: [95982.560034] EXT4-fs (dm-1): error count: 1
Jul 16 18:05:14 redshift kernel: [95982.560044] EXT4-fs (dm-1): initial error at 1373907663: ext4_mb_generate_buddy:739
Jul 16 18:05:14 redshift kernel: [95982.560053] EXT4-fs (dm-1): last error at 1373907663: ext4_mb_generate_buddy:739
...
Jul 16 20:53:19 redshift kernel: [106068.077526] EXT4-fs (dm-1): ext4_check_descriptors: Checksum for group 0 failed (47831!=4825)
Jul 16 20:53:19 redshift kernel: [106068.077540] EXT4-fs (dm-1): ext4_check_descriptors: Checksum for group 1 failed (14670!=8882)
I see that in an astonishing display of synchronicity, Darrick J Wong
filed a patch at 17 Jul 2013 04:02 -- the very next day, or maybe
even the same day, depending on timezone -- to prevent the knockon
effects (see "[PATCH] ext4: Prevent massive fs corruption if verifying
the block bitmap fails" at http://permalink.gmane.org/gmane.comp.file-
systems.ext4/39535 ).
But what puzzles me is that the initial triggering bug is still in
this kernel (vmlinuz-3.2.0-49-generic), when according to this
conversation https://bugzilla.kernel.org/show_bug.cgi?id=42723#c8 the
fix was backported to 3.2.20? Is it possible that there is another way
of getting the "ext4_mb_generate_buddy:739" error?
I have kept an e2image dump of the corrupted FS in case it's of any
use to EXT4 developers, but it's not attached, as even in QCOW2 format
it's ~1Gb.
ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-49-generic 3.2.0-49.75
ProcVersionSignature: Ubuntu 3.2.0-49.75-generic 3.2.46
Uname: Linux 3.2.0-49-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 2.0.1-0ubuntu17.3
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC1: dsg 7005 F.... pulseaudio
/dev/snd/controlC0: dsg 7005 F.... pulseaudio
CRDA:
country AW:
(2402 - 2482 @ 40), (N/A, 20)
(5170 - 5250 @ 40), (N/A, 20)
(5250 - 5330 @ 40), (N/A, 20), DFS
(5490 - 5710 @ 40), (N/A, 27), DFS
Card0.Amixer.info:
Card hw:0 'SB'/'HDA ATI SB at 0xfe024000 irq 16'
Mixer name : 'Realtek ALC892'
Components : 'HDA:10ec0892,1458a102,00100302'
Controls : 46
Simple ctrls : 21
Card1.Amixer.info:
Card hw:1 'HDMI'/'HDA ATI HDMI at 0xfdefc000 irq 19'
Mixer name : 'ATI RS690/780 HDMI'
Components : 'HDA:1002791a,00791a00,00100000'
Controls : 4
Simple ctrls : 1
Card1.Amixer.values:
Simple mixer control 'IEC958',0
Capabilities: pswitch pswitch-joined penum
Playback channels: Mono
Mono: Playback [on]
Date: Thu Jul 18 19:04:57 2013
HibernationDevice: RESUME=UUID=2ab26064-3b90-475d-b3c2-51a70c2d990a
InstallationMedia: Kubuntu 12.04.1 LTS "Precise Pangolin" - Release amd64 (20120822.2)
MachineType: Gigabyte Technology Co., Ltd. GA-890GPA-UD3H
MarkForUpload: True
ProcEnviron:
LANGUAGE=en_GB
TERM=xterm
PATH=(custom, no user)
LANG=en_GB.UTF-8
SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.2.0-49-generic root=/dev/mapper/system-kubuntu ro quiet splash vt.handoff=7
RelatedPackageVersions:
linux-restricted-modules-3.2.0-49-generic N/A
linux-backports-modules-3.2.0-49-generic N/A
linux-firmware 1.79.4
RfKill:
0: phy0: Wireless LAN
Soft blocked: yes
Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 07/23/2010
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: FD
dmi.board.name: GA-890GPA-UD3H
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrFD:bd07/23/2010:svnGigabyteTechnologyCo.,Ltd.:pnGA-890GPA-UD3H:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnGA-890GPA-UD3H:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: GA-890GPA-UD3H
dmi.sys.vendor: Gigabyte Technology Co., Ltd.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1202994/+subscriptions
References