← Back to team overview

kernel-packages team mailing list archive

[Bug 1349711] Re: Machine lockup in btrfs-transaction

 

So the way I read the thread, there is the (apparently better known for
developers than it is currently documented) basic problem with btrfs
that it can run out of space rather unexpectedly.  I was a bit surprised
as well to read that 500MB (while looking like a whole lot of space
coming from using other filesystems) can go away quickly as there are
certain big chunks by which allocation is done. And all the information
seems to be spread into output of various commands.

What I take from that for the bug report here is that because of getting
close to the no space situation the fs is in a state where it needs to
do a lot of bookkeeping. The problem with the softlockup warnings is
that they can just report something that is not a complete lockup. So
basically doing fs operations in the foreground cause data to be written
out in the background. If the organizational layout of the fs becomes
complicated it takes more processing time to find out where to actually
put the data (also the meta-data has to be updated in parallel). At some
point (if the fs gets exercised more) the backlog grows and most of the
free memory is used up by buffers for the fs. So that explains the very
laggy responsiveness of the whole system. And of course at some point
there will be a sync point of the fs and that might be waiting on
something that is not compled for quite a long time. So maybe with a lot
of patience the system (after interrupting the updates) would recover
and become responsive again.

Bottom line would be that this is not a lockup but a fallout from over-
fullness combined with bad handling (or maybe bad just because things
get complicated and so more difficult) of the io. So not much that can
be done at that point but has to be prevented in advance. Unfortnunately
manually,

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1349711

Title:
  Machine lockup in btrfs-transaction

Status in “linux” package in Ubuntu:
  Confirmed

Bug description:
  This has happened twice now.

  I'm on an AWS EC2 m3.large instance with the official Ubuntu AMI ami-
  776d9700.

  # cat /proc/version_signature
  Ubuntu 3.13.0-32.57-generic 3.13.11.4

  After running for many days, the machine locked up with the below
  messages appearing on the console. The machine would respond to ping
  but not SSH or HTTP requests. The machine has one BTRFS volume which
  is 87% full and lives on an Logical Volume Manager (LVM) block device
  on top of one Amazon Elastic Block Store (EBS) device.

  Error messages after first reboot:

  [   77.609490] BTRFS error (device dm-0): block group 10766778368 has wrong amount of free space
  [   77.613678] BTRFS error (device dm-0): failed to load free space cache for block group 10766778368
  [   77.643801] BTRFS error (device dm-0): block group 19356712960 has wrong amount of free space
  [   77.648952] BTRFS error (device dm-0): failed to load free space cache for block group 19356712960
  [   77.926325] BTRFS error (device dm-0): block group 20430454784 has wrong amount of free space
  [   77.931078] BTRFS error (device dm-0): failed to load free space cache for block group 20430454784
  [   78.111437] BTRFS error (device dm-0): block group 21504196608 has wrong amount of free space
  [   78.116165] BTRFS error (device dm-0): failed to load free space cache for block group 21504196608

  Error messages after second reboot:

  [   45.390221] BTRFS error (device dm-0): free space inode generation (0) did not match free space cache generation (70012)
  [   45.413472] BTRFS error (device dm-0): free space inode generation (0) did not match free space cache generation (70012)
  [  467.423961] BTRFS error (device dm-0): block group 518646661120 has wrong amount of free space
  [  467.429251] BTRFS error (device dm-0): failed to load free space cache for block group 518646661120

  Error messages on the console after second lock-up follow:

  [246736.752053] INFO: rcu_sched self-detected stall on CPU { 0}  (t=2220246 jiffies g=35399662 c=35399661 q=0)
  [246736.756059] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 1, t=2220247 jiffies, g=35399662, c=35399661, q=0)
  [246764.192014] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u30:2:1828]
  [246764.212058] BUG: soft lockup - CPU#1 stuck for 23s! [btrfs-transacti:492]
  [246792.192022] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u30:2:1828]
  [246792.212057] BUG: soft lockup - CPU#1 stuck for 23s! [btrfs-transacti:492]
  [246820.192052] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u30:2:1828]
  [246820.212018] BUG: soft lockup - CPU#1 stuck for 23s! [btrfs-transacti:492]
  [246848.192052] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u30:2:1828]
  [246848.212058] BUG: soft lockup - CPU#1 stuck for 23s! [btrfs-transacti:492]
  [246876.192053] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u30:2:1828]
  [246876.212057] BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-transacti:492]
  [246904.192053] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u30:2:1828]
  [246904.212058] BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-transacti:492]
  [246916.772052] INFO: rcu_sched self-detected stall on CPU[246916.776058] INFO: rcu_sched detected stalls on CPUs/tasks:
  [246944.192053] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u30:2:1828]
  [246944.212058] BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-transacti:492]
  [246972.192053] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u30:2:1828]
  [246972.212018] BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-transacti:492]
  [247000.192053] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u30:2:1828]
  [247000.212058] BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-transacti:492]
  [247028.192054] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u30:2:1828]
  [247028.212058] BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-transacti:492]
  [247056.192053] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u30:2:1828]
  [247056.212061] BUG: soft lockup - CPU#1 stuck for 23s! [btrfs-transacti:492]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349711/+subscriptions


References