← Back to team overview

kernel-packages team mailing list archive

[Bug 1202994] [NEW] EXT4 filesystem corruption with uninit_bg and error=continue

 

Public bug reported:

There was a long and complicated sequence of activities involving mdadm,
lvm, and specifically pvmove leading up to the point where the
corruption was discovered, but I suspect most were irrelevant. AFAICT,
the bug was triggered by the following simple operations:

* the FS was unmounted & remounted -- thus, the journal was fresh and hadn't wrapped (which other reports appear to indicate would have prevented the bug showing up)
* the FS options include uninit_bg AND error=continue
* a bunch of files were then copied onto the FS -- this was the last write operation on the FS.

Later, e2fsck indicated a bunch of problems, including corrupted group
descriptors. Specifically, it fould that many blocks were now claimed by
two files; in each case, one was an old file and one was one of those
newly copied, and the contents matched the expected data for latter.

So I think this starts with an instance of the miscalculation of
checksums in uninit_bg blocks (fixed by Ted Ts'o last June), followed by
the (invalid or uninitialised) bitmap being used anyway (because
error=continue) and the blocks it appeared to show as free then being
allocated to new files.

Jul 15 18:01:03 redshift kernel: [ 9332.021245] EXT4-fs error (device dm-1): ext4_mb_generate_buddy:739: group 2968, 8105 clusters in bitmap, 0 in gd
...
Jul 16 18:05:14 redshift kernel: [95982.560034] EXT4-fs (dm-1): error count: 1
Jul 16 18:05:14 redshift kernel: [95982.560044] EXT4-fs (dm-1): initial error at 1373907663: ext4_mb_generate_buddy:739
Jul 16 18:05:14 redshift kernel: [95982.560053] EXT4-fs (dm-1): last error at 1373907663: ext4_mb_generate_buddy:739
...
Jul 16 20:53:19 redshift kernel: [106068.077526] EXT4-fs (dm-1): ext4_check_descriptors: Checksum for group 0 failed (47831!=4825)
Jul 16 20:53:19 redshift kernel: [106068.077540] EXT4-fs (dm-1): ext4_check_descriptors: Checksum for group 1 failed (14670!=8882)

I see that in an astonishing display of synchronicity, Darrick J Wong
filed a patch at 17 Jul 2013 04:02  -- the very next day, or maybe even
the same day, depending on timezone -- to prevent the knockon effects
(see "[PATCH] ext4: Prevent massive fs corruption if verifying the block
bitmap fails" at http://permalink.gmane.org/gmane.comp.file-
systems.ext4/39535 ).

But what puzzles me is that the initial triggering bug is still in this
kernel (vmlinuz-3.2.0-49-generic), when according to this conversation
https://bugzilla.kernel.org/show_bug.cgi?id=42723#c8 the fix was
backported to 3.2.20? Is it possible that there is another way of
getting the "ext4_mb_generate_buddy:739" error?

I have kept an e2image dump of the corrupted FS in case it's of any use
to EXT4 developers, but it's not attached, as even in QCOW2 format it's
~1Gb.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-49-generic 3.2.0-49.75
ProcVersionSignature: Ubuntu 3.2.0-49.75-generic 3.2.46
Uname: Linux 3.2.0-49-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 2.0.1-0ubuntu17.3
Architecture: amd64
AudioDevicesInUse:
 USER        PID ACCESS COMMAND
 /dev/snd/controlC1:  dsg        7005 F.... pulseaudio
 /dev/snd/controlC0:  dsg        7005 F.... pulseaudio
CRDA:
 country AW:
 	(2402 - 2482 @ 40), (N/A, 20)
 	(5170 - 5250 @ 40), (N/A, 20)
 	(5250 - 5330 @ 40), (N/A, 20), DFS
 	(5490 - 5710 @ 40), (N/A, 27), DFS
Card0.Amixer.info:
 Card hw:0 'SB'/'HDA ATI SB at 0xfe024000 irq 16'
   Mixer name	: 'Realtek ALC892'
   Components	: 'HDA:10ec0892,1458a102,00100302'
   Controls      : 46
   Simple ctrls  : 21
Card1.Amixer.info:
 Card hw:1 'HDMI'/'HDA ATI HDMI at 0xfdefc000 irq 19'
   Mixer name	: 'ATI RS690/780 HDMI'
   Components	: 'HDA:1002791a,00791a00,00100000'
   Controls      : 4
   Simple ctrls  : 1
Card1.Amixer.values:
 Simple mixer control 'IEC958',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [on]
Date: Thu Jul 18 19:04:57 2013
HibernationDevice: RESUME=UUID=2ab26064-3b90-475d-b3c2-51a70c2d990a
InstallationMedia: Kubuntu 12.04.1 LTS "Precise Pangolin" - Release amd64 (20120822.2)
MachineType: Gigabyte Technology Co., Ltd. GA-890GPA-UD3H
MarkForUpload: True
ProcEnviron:
 LANGUAGE=en_GB
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.2.0-49-generic root=/dev/mapper/system-kubuntu ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-49-generic N/A
 linux-backports-modules-3.2.0-49-generic  N/A
 linux-firmware                            1.79.4
RfKill:
 0: phy0: Wireless LAN
 	Soft blocked: yes
 	Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 07/23/2010
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: FD
dmi.board.name: GA-890GPA-UD3H
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrFD:bd07/23/2010:svnGigabyteTechnologyCo.,Ltd.:pnGA-890GPA-UD3H:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnGA-890GPA-UD3H:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: GA-890GPA-UD3H
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug precise

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1202994

Title:
  EXT4 filesystem corruption with uninit_bg and error=continue

Status in “linux” package in Ubuntu:
  New

Bug description:
  There was a long and complicated sequence of activities involving
  mdadm, lvm, and specifically pvmove leading up to the point where the
  corruption was discovered, but I suspect most were irrelevant. AFAICT,
  the bug was triggered by the following simple operations:

  * the FS was unmounted & remounted -- thus, the journal was fresh and hadn't wrapped (which other reports appear to indicate would have prevented the bug showing up)
  * the FS options include uninit_bg AND error=continue
  * a bunch of files were then copied onto the FS -- this was the last write operation on the FS.

  Later, e2fsck indicated a bunch of problems, including corrupted group
  descriptors. Specifically, it fould that many blocks were now claimed
  by two files; in each case, one was an old file and one was one of
  those newly copied, and the contents matched the expected data for
  latter.

  So I think this starts with an instance of the miscalculation of
  checksums in uninit_bg blocks (fixed by Ted Ts'o last June), followed
  by the (invalid or uninitialised) bitmap being used anyway (because
  error=continue) and the blocks it appeared to show as free then being
  allocated to new files.

  Jul 15 18:01:03 redshift kernel: [ 9332.021245] EXT4-fs error (device dm-1): ext4_mb_generate_buddy:739: group 2968, 8105 clusters in bitmap, 0 in gd
  ...
  Jul 16 18:05:14 redshift kernel: [95982.560034] EXT4-fs (dm-1): error count: 1
  Jul 16 18:05:14 redshift kernel: [95982.560044] EXT4-fs (dm-1): initial error at 1373907663: ext4_mb_generate_buddy:739
  Jul 16 18:05:14 redshift kernel: [95982.560053] EXT4-fs (dm-1): last error at 1373907663: ext4_mb_generate_buddy:739
  ...
  Jul 16 20:53:19 redshift kernel: [106068.077526] EXT4-fs (dm-1): ext4_check_descriptors: Checksum for group 0 failed (47831!=4825)
  Jul 16 20:53:19 redshift kernel: [106068.077540] EXT4-fs (dm-1): ext4_check_descriptors: Checksum for group 1 failed (14670!=8882)

  I see that in an astonishing display of synchronicity, Darrick J Wong
  filed a patch at 17 Jul 2013 04:02  -- the very next day, or maybe
  even the same day, depending on timezone -- to prevent the knockon
  effects (see "[PATCH] ext4: Prevent massive fs corruption if verifying
  the block bitmap fails" at http://permalink.gmane.org/gmane.comp.file-
  systems.ext4/39535 ).

  But what puzzles me is that the initial triggering bug is still in
  this kernel (vmlinuz-3.2.0-49-generic), when according to this
  conversation https://bugzilla.kernel.org/show_bug.cgi?id=42723#c8 the
  fix was backported to 3.2.20? Is it possible that there is another way
  of getting the "ext4_mb_generate_buddy:739" error?

  I have kept an e2image dump of the corrupted FS in case it's of any
  use to EXT4 developers, but it's not attached, as even in QCOW2 format
  it's ~1Gb.

  ProblemType: Bug
  DistroRelease: Ubuntu 12.04
  Package: linux-image-3.2.0-49-generic 3.2.0-49.75
  ProcVersionSignature: Ubuntu 3.2.0-49.75-generic 3.2.46
  Uname: Linux 3.2.0-49-generic x86_64
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
  ApportVersion: 2.0.1-0ubuntu17.3
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC1:  dsg        7005 F.... pulseaudio
   /dev/snd/controlC0:  dsg        7005 F.... pulseaudio
  CRDA:
   country AW:
   	(2402 - 2482 @ 40), (N/A, 20)
   	(5170 - 5250 @ 40), (N/A, 20)
   	(5250 - 5330 @ 40), (N/A, 20), DFS
   	(5490 - 5710 @ 40), (N/A, 27), DFS
  Card0.Amixer.info:
   Card hw:0 'SB'/'HDA ATI SB at 0xfe024000 irq 16'
     Mixer name	: 'Realtek ALC892'
     Components	: 'HDA:10ec0892,1458a102,00100302'
     Controls      : 46
     Simple ctrls  : 21
  Card1.Amixer.info:
   Card hw:1 'HDMI'/'HDA ATI HDMI at 0xfdefc000 irq 19'
     Mixer name	: 'ATI RS690/780 HDMI'
     Components	: 'HDA:1002791a,00791a00,00100000'
     Controls      : 4
     Simple ctrls  : 1
  Card1.Amixer.values:
   Simple mixer control 'IEC958',0
     Capabilities: pswitch pswitch-joined penum
     Playback channels: Mono
     Mono: Playback [on]
  Date: Thu Jul 18 19:04:57 2013
  HibernationDevice: RESUME=UUID=2ab26064-3b90-475d-b3c2-51a70c2d990a
  InstallationMedia: Kubuntu 12.04.1 LTS "Precise Pangolin" - Release amd64 (20120822.2)
  MachineType: Gigabyte Technology Co., Ltd. GA-890GPA-UD3H
  MarkForUpload: True
  ProcEnviron:
   LANGUAGE=en_GB
   TERM=xterm
   PATH=(custom, no user)
   LANG=en_GB.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 radeondrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.2.0-49-generic root=/dev/mapper/system-kubuntu ro quiet splash vt.handoff=7
  RelatedPackageVersions:
   linux-restricted-modules-3.2.0-49-generic N/A
   linux-backports-modules-3.2.0-49-generic  N/A
   linux-firmware                            1.79.4
  RfKill:
   0: phy0: Wireless LAN
   	Soft blocked: yes
   	Hard blocked: no
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 07/23/2010
  dmi.bios.vendor: Award Software International, Inc.
  dmi.bios.version: FD
  dmi.board.name: GA-890GPA-UD3H
  dmi.board.vendor: Gigabyte Technology Co., Ltd.
  dmi.board.version: x.x
  dmi.chassis.type: 3
  dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
  dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrFD:bd07/23/2010:svnGigabyteTechnologyCo.,Ltd.:pnGA-890GPA-UD3H:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnGA-890GPA-UD3H:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
  dmi.product.name: GA-890GPA-UD3H
  dmi.sys.vendor: Gigabyte Technology Co., Ltd.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1202994/+subscriptions


Follow ups

References