kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #74086
[Bug 1349711] Re: Machine lockup in btrfs-transaction
The production machine hasn't had a lockup since moving to
3.15.7-031507-generic (it's been up for 4 days) even though we could
reproduce the lockup on a new machine with that kernel using a snapshot
of the old volume.
Another twist is that on the productino machine I'm now reliably seeing
"No space left on device", even though there appears to be in principle
63GB remaining:
$ btrfs fi df /path/to/volume
Data, single: total=489.97GiB, used=427.75GiB
System, DUP: total=8.00MiB, used=60.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=5.00GiB, used=4.50GiB
Metadata, single: total=8.00MiB, used=0.00
unknown, single: total=512.00MiB, used=0.00
$ sudo btrfs fi show /path/to/volume
Label: none uuid: 3ffd71ab-6c3d-4486-a6b0-5c1eeb9be6b3
Total devices 1 FS bytes used 432.25GiB
devid 1 size 500.00GiB used 500.00GiB path /dev/dm-0
The ENOSPC is happening for mkdir and rename syscalls in particular.
I've posted a mail to the BTRFS list about this:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/37415
I did a rebalance with `btrfs balance start -dusage=10` (increasing 10)
to try and gain more space for metadata, but this didn't fix the
situation. I did however get this stack trace in dmesg.
In the end, I had to enlarge the volume before it became usable again.
[375794.106653] ------------[ cut here ]------------
[375794.106676] WARNING: CPU: 1 PID: 24706 at /home/apw/COD/linux/fs/btrfs/extent-tree.c:6946 use_block_rsv+0xfd/0x1a0 [btrfs]()
[375794.106678] BTRFS: block rsv returned -28
[375794.106679] Modules linked in: softdog tcp_diag inet_diag dm_crypt ppdev xen_fbfront fb_sys_fops syscopyarea sysfillrect sysimgblt i2c_piix4 serio_raw parport_pc parport mac_hid isofs xt_tcpudp iptable_filter xt_owner ip_tables x_tables btrfs xor raid6_pq crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd floppy psmouse
[375794.106702] CPU: 1 PID: 24706 Comm: twsearch.py Not tainted 3.15.7-031507-generic #201407281235
[375794.106703] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/23/2014
[375794.106705] 0000000000001b22 ffff88016db437c8 ffffffff8176f115 0000000000000007
[375794.106707] ffff88016db43818 ffff88016db43808 ffffffff8106ceac ffff8801e4890000
[375794.106709] ffff8800a71ab9c0 ffff8801aedcd800 0000000000001000 ffff88001c987000
[375794.106711] Call Trace:
[375794.106718] [<ffffffff8176f115>] dump_stack+0x46/0x58
[375794.106721] [<ffffffff8106ceac>] warn_slowpath_common+0x8c/0xc0
[375794.106723] [<ffffffff8106cf96>] warn_slowpath_fmt+0x46/0x50
[375794.106731] [<ffffffffa00d9d1d>] use_block_rsv+0xfd/0x1a0 [btrfs]
[375794.106739] [<ffffffffa00de687>] btrfs_alloc_free_block+0x57/0x220 [btrfs]
[375794.106746] [<ffffffffa00c8a3c>] btrfs_copy_root+0xfc/0x2b0 [btrfs]
[375794.106757] [<ffffffffa013a583>] ? create_reloc_root+0x33/0x2c0 [btrfs]
[375794.106767] [<ffffffffa013a743>] create_reloc_root+0x1f3/0x2c0 [btrfs]
[375794.106776] [<ffffffffa0140eb8>] btrfs_init_reloc_root+0xb8/0xd0 [btrfs]
[375794.106784] [<ffffffffa00ee967>] record_root_in_trans.part.30+0x97/0x100 [btrfs]
[375794.106792] [<ffffffffa00ee9f4>] record_root_in_trans+0x24/0x30 [btrfs]
[375794.106800] [<ffffffffa00efeb1>] btrfs_record_root_in_trans+0x51/0x80 [btrfs]
[375794.106808] [<ffffffffa00f13d6>] start_transaction.part.35+0x86/0x560 [btrfs]
[375794.106815] [<ffffffffa00d1ee0>] ? btrfs_reduce_alloc_profile.isra.48+0x80/0x160 [btrfs]
[375794.106818] [<ffffffff8109be78>] ? finish_task_switch+0x128/0x180
[375794.106826] [<ffffffffa00f18d9>] start_transaction+0x29/0x30 [btrfs]
[375794.106834] [<ffffffffa00f19a7>] btrfs_join_transaction+0x17/0x20 [btrfs]
[375794.106841] [<ffffffffa00d9764>] flush_space+0xf4/0x160 [btrfs]
[375794.106848] [<ffffffffa00d998a>] reserve_metadata_bytes+0x1ba/0x450 [btrfs]
[375794.106851] [<ffffffff811dd073>] ? generic_permission+0xf3/0x120
[375794.106854] [<ffffffff812f010c>] ? security_inode_permission+0x1c/0x30
[375794.106857] [<ffffffff810b5450>] ? __wake_up_sync+0x20/0x20
[375794.106864] [<ffffffffa00daf3a>] btrfs_delalloc_reserve_metadata+0x16a/0x4a0 [btrfs]
[375794.106873] [<ffffffffa0102b3d>] __btrfs_buffered_write+0x15d/0x5c0 [btrfs]
[375794.106877] [<ffffffff8118bd9c>] ? handle_pte_fault+0x18c/0x1b0
[375794.106886] [<ffffffffa010319f>] btrfs_file_aio_write+0x1ff/0x3b0 [btrfs]
[375794.106889] [<ffffffff811d268a>] do_sync_write+0x5a/0x90
[375794.106892] [<ffffffff811d32db>] vfs_write+0xcb/0x1f0
[375794.106894] [<ffffffff811d37df>] SyS_write+0x4f/0xb0
[375794.106897] [<ffffffff817858bf>] tracesys+0xe1/0xe6
[375794.106898] ---[ end trace 1853311c87a5cd93 ]---
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1349711
Title:
Machine lockup in btrfs-transaction
Status in “linux” package in Ubuntu:
Confirmed
Bug description:
This has happened twice now.
I'm on an AWS EC2 m3.large instance with the official Ubuntu AMI ami-
776d9700.
# cat /proc/version_signature
Ubuntu 3.13.0-32.57-generic 3.13.11.4
After running for many days, the machine locked up with the below
messages appearing on the console. The machine would respond to ping
but not SSH or HTTP requests. The machine has one BTRFS volume which
is 87% full and lives on an Logical Volume Manager (LVM) block device
on top of one Amazon Elastic Block Store (EBS) device.
Error messages after first reboot:
[ 77.609490] BTRFS error (device dm-0): block group 10766778368 has wrong amount of free space
[ 77.613678] BTRFS error (device dm-0): failed to load free space cache for block group 10766778368
[ 77.643801] BTRFS error (device dm-0): block group 19356712960 has wrong amount of free space
[ 77.648952] BTRFS error (device dm-0): failed to load free space cache for block group 19356712960
[ 77.926325] BTRFS error (device dm-0): block group 20430454784 has wrong amount of free space
[ 77.931078] BTRFS error (device dm-0): failed to load free space cache for block group 20430454784
[ 78.111437] BTRFS error (device dm-0): block group 21504196608 has wrong amount of free space
[ 78.116165] BTRFS error (device dm-0): failed to load free space cache for block group 21504196608
Error messages after second reboot:
[ 45.390221] BTRFS error (device dm-0): free space inode generation (0) did not match free space cache generation (70012)
[ 45.413472] BTRFS error (device dm-0): free space inode generation (0) did not match free space cache generation (70012)
[ 467.423961] BTRFS error (device dm-0): block group 518646661120 has wrong amount of free space
[ 467.429251] BTRFS error (device dm-0): failed to load free space cache for block group 518646661120
Error messages on the console after second lock-up follow:
[246736.752053] INFO: rcu_sched self-detected stall on CPU { 0} (t=2220246 jiffies g=35399662 c=35399661 q=0)
[246736.756059] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 1, t=2220247 jiffies, g=35399662, c=35399661, q=0)
[246764.192014] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u30:2:1828]
[246764.212058] BUG: soft lockup - CPU#1 stuck for 23s! [btrfs-transacti:492]
[246792.192022] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u30:2:1828]
[246792.212057] BUG: soft lockup - CPU#1 stuck for 23s! [btrfs-transacti:492]
[246820.192052] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u30:2:1828]
[246820.212018] BUG: soft lockup - CPU#1 stuck for 23s! [btrfs-transacti:492]
[246848.192052] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u30:2:1828]
[246848.212058] BUG: soft lockup - CPU#1 stuck for 23s! [btrfs-transacti:492]
[246876.192053] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u30:2:1828]
[246876.212057] BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-transacti:492]
[246904.192053] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u30:2:1828]
[246904.212058] BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-transacti:492]
[246916.772052] INFO: rcu_sched self-detected stall on CPU[246916.776058] INFO: rcu_sched detected stalls on CPUs/tasks:
[246944.192053] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u30:2:1828]
[246944.212058] BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-transacti:492]
[246972.192053] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u30:2:1828]
[246972.212018] BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-transacti:492]
[247000.192053] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u30:2:1828]
[247000.212058] BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-transacti:492]
[247028.192054] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u30:2:1828]
[247028.212058] BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-transacti:492]
[247056.192053] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u30:2:1828]
[247056.212061] BUG: soft lockup - CPU#1 stuck for 23s! [btrfs-transacti:492]
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349711/+subscriptions
References