kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #101753
[Bug 1415510] Re: Frequent kernel panics when doing heavy I/O in LXC containers on Btrfs
Update: using the mainline kernel, I observe a slightly different
pattern. When running multiple heavy I/O operations in parallel (e.g.
rsyncing a large ISO image to a container, performing an http upload
into another one and running "yum update" on all containers), the large
uploads start to stall and come to a crawling halt at some point.
"dmesg" reveals some different btrfs related issues:
[ 6838.005920] INFO: task kworker/u16:0:5815 blocked for more than 120 seconds.
[ 6838.005924] Not tainted 3.19.0-031900rc6-generic #201501261152
[ 6838.005925] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6838.005926] kworker/u16:0 D ffff88024422bb18 0 5815 2 0x00000000
[ 6838.005953] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
[ 6838.005954] ffff88024422bb18 ffff88024422bad8 ffff88024422bfd8 00000000000141c0
[ 6838.005956] ffff88030c1b0700 ffff88021a1e13a0 ffff8802c78a75c0 ffff88024422bb08
[ 6838.005958] ffff88024422bc88 7fffffffffffffff 7fffffffffffffff ffff8802c78a75c0
[ 6838.005959] Call Trace:
[ 6838.005965] [<ffffffff817cd6b9>] schedule+0x29/0x70
[ 6838.005968] [<ffffffff817d0445>] schedule_timeout+0x1b5/0x210
[ 6838.005972] [<ffffffff8108e01a>] ? __queue_delayed_work+0xaa/0x1a0
[ 6838.005974] [<ffffffff8108e5db>] ? try_to_grab_pending+0x4b/0x80
[ 6838.005976] [<ffffffff817cebc7>] wait_for_completion+0xa7/0x160
[ 6838.005979] [<ffffffff810a3fa0>] ? try_to_wake_up+0x2a0/0x2a0
[ 6838.005983] [<ffffffff8121d6c6>] writeback_inodes_sb_nr+0x86/0xb0
[ 6838.005997] [<ffffffffc0630b9d>] shrink_delalloc+0x10d/0x300 [btrfs]
[ 6838.006011] [<ffffffffc0630e68>] flush_space+0xd8/0x150 [btrfs]
[ 6838.006022] [<ffffffffc063175b>] btrfs_async_reclaim_metadata_space+0x14b/0x1d0 [btrfs]
[ 6838.006024] [<ffffffff8108f6dd>] process_one_work+0x14d/0x460
[ 6838.006026] [<ffffffff810900bb>] worker_thread+0x11b/0x3f0
[ 6838.006029] [<ffffffff8108ffa0>] ? create_worker+0x1e0/0x1e0
[ 6838.006031] [<ffffffff81095cc9>] kthread+0xc9/0xe0
[ 6838.006032] [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 6838.006035] [<ffffffff817d17fc>] ret_from_fork+0x7c/0xb0
[ 6838.006037] [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 6957.962660] INFO: task kworker/u16:0:5815 blocked for more than 120 seconds.
[ 6957.962667] Not tainted 3.19.0-031900rc6-generic #201501261152
[ 6957.962668] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6957.962671] kworker/u16:0 D ffff88024422bb18 0 5815 2 0x00000000
[ 6957.962706] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
[ 6957.962709] ffff88024422bb18 ffff88024422bad8 ffff88024422bfd8 00000000000141c0
[ 6957.962713] ffff88030c1b0700 ffff88021a1e13a0 ffff8802c78a75c0 ffff88024422bb08
[ 6957.962716] ffff88024422bc88 7fffffffffffffff 7fffffffffffffff ffff8802c78a75c0
[ 6957.962720] Call Trace:
[ 6957.962741] [<ffffffff817cd6b9>] schedule+0x29/0x70
[ 6957.962746] [<ffffffff817d0445>] schedule_timeout+0x1b5/0x210
[ 6957.962752] [<ffffffff8108e01a>] ? __queue_delayed_work+0xaa/0x1a0
[ 6957.962756] [<ffffffff8108e5db>] ? try_to_grab_pending+0x4b/0x80
[ 6957.962760] [<ffffffff817cebc7>] wait_for_completion+0xa7/0x160
[ 6957.962765] [<ffffffff810a3fa0>] ? try_to_wake_up+0x2a0/0x2a0
[ 6957.962771] [<ffffffff8121d6c6>] writeback_inodes_sb_nr+0x86/0xb0
[ 6957.962787] [<ffffffffc0630b9d>] shrink_delalloc+0x10d/0x300 [btrfs]
[ 6957.962803] [<ffffffffc0630e68>] flush_space+0xd8/0x150 [btrfs]
[ 6957.962817] [<ffffffffc063175b>] btrfs_async_reclaim_metadata_space+0x14b/0x1d0 [btrfs]
[ 6957.962822] [<ffffffff8108f6dd>] process_one_work+0x14d/0x460
[ 6957.962826] [<ffffffff810900bb>] worker_thread+0x11b/0x3f0
[ 6957.962830] [<ffffffff8108ffa0>] ? create_worker+0x1e0/0x1e0
[ 6957.962834] [<ffffffff81095cc9>] kthread+0xc9/0xe0
[ 6957.962838] [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 6957.962842] [<ffffffff817d17fc>] ret_from_fork+0x7c/0xb0
[ 6957.962846] [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 6962.761961] systemd-hostnamed[15586]: Warning: nss-myhostname is not installed. Changing the local hostname might make it unresolveable. Please install nss-myhostname!
[ 7437.789596] INFO: task yum:14547 blocked for more than 120 seconds.
[ 7437.789600] Not tainted 3.19.0-031900rc6-generic #201501261152
[ 7437.789601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7437.789602] yum D ffff880286777868 0 14547 14546 0x00000000
[ 7437.789605] ffff880286777868 0000000200000001 ffff880286777fd8 00000000000141c0
[ 7437.789607] ffff88002e07db00 ffffffff81c1c500 ffff8801f8892740 ffff880286777858
[ 7437.789608] ffff8802867779d8 7fffffffffffffff 7fffffffffffffff ffff8801f8892740
[ 7437.789610] Call Trace:
[ 7437.789616] [<ffffffff817cd6b9>] schedule+0x29/0x70
[ 7437.789619] [<ffffffff817d0445>] schedule_timeout+0x1b5/0x210
[ 7437.789623] [<ffffffff8108e01a>] ? __queue_delayed_work+0xaa/0x1a0
[ 7437.789625] [<ffffffff8108e5db>] ? try_to_grab_pending+0x4b/0x80
[ 7437.789628] [<ffffffff817cebc7>] wait_for_completion+0xa7/0x160
[ 7437.789634] [<ffffffff810a3fa0>] ? try_to_wake_up+0x2a0/0x2a0
[ 7437.789638] [<ffffffff8121d6c6>] writeback_inodes_sb_nr+0x86/0xb0
[ 7437.789674] [<ffffffffc0630b9d>] shrink_delalloc+0x10d/0x300 [btrfs]
[ 7437.789692] [<ffffffffc0628cbd>] ? get_alloc_profile+0x5d/0x90 [btrfs]
[ 7437.789707] [<ffffffffc06304c0>] ? btrfs_get_alloc_profile+0x30/0x40 [btrfs]
[ 7437.789719] [<ffffffffc0630e68>] flush_space+0xd8/0x150 [btrfs]
[ 7437.789731] [<ffffffffc06310b9>] reserve_metadata_bytes+0x1d9/0x590 [btrfs]
[ 7437.789743] [<ffffffffc0624659>] ? btrfs_search_slot+0x3a9/0x870 [btrfs]
[ 7437.789760] [<ffffffffc0664d50>] ? set_state_bits+0x40/0x80 [btrfs]
[ 7437.789773] [<ffffffffc06320f5>] btrfs_block_rsv_add+0x35/0x60 [btrfs]
[ 7437.789788] [<ffffffffc065fff2>] ? try_merge_map+0x32/0x150 [btrfs]
[ 7437.789801] [<ffffffffc0649e15>] start_transaction.part.35+0x185/0x540 [btrfs]
[ 7437.789813] [<ffffffffc064a1f9>] start_transaction+0x29/0x30 [btrfs]
[ 7437.789824] [<ffffffffc064a53b>] btrfs_start_transaction+0x1b/0x20 [btrfs]
[ 7437.789837] [<ffffffffc065344a>] maybe_insert_hole+0x8a/0x1b0 [btrfs]
[ 7437.789847] [<ffffffffc0655452>] btrfs_cont_expand+0x1c2/0x340 [btrfs]
[ 7437.789857] [<ffffffffc065f310>] btrfs_file_write_iter+0x2e0/0x360 [btrfs]
[ 7437.789859] [<ffffffff811f408b>] new_sync_write+0x7b/0xb0
[ 7437.789861] [<ffffffff811f4f07>] vfs_write+0xc7/0x1f0
[ 7437.789862] [<ffffffff811f52af>] SyS_write+0x4f/0xb0
[ 7437.789865] [<ffffffff817cd6b9>] ? schedule+0x29/0x70
[ 7437.789867] [<ffffffff817d18ad>] system_call_fastpath+0x16/0x1b
[ 7677.703046] INFO: task kworker/u16:10:16126 blocked for more than 120 seconds.
[ 7677.703051] Not tainted 3.19.0-031900rc6-generic #201501261152
[ 7677.703053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7677.703055] kworker/u16:10 D ffff88010fdabb18 0 16126 2 0x00000000
[ 7677.703086] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
[ 7677.703088] ffff88010fdabb18 01000000000000a1 ffff88010fdabfd8 00000000000141c0
[ 7677.703091] ffff88002e07f700 ffff8802443075c0 ffff880209aff5c0 ffff88010fdabb08
[ 7677.703094] ffff88010fdabc88 7fffffffffffffff 7fffffffffffffff ffff880209aff5c0
[ 7677.703097] Call Trace:
[ 7677.703103] [<ffffffff817cd6b9>] schedule+0x29/0x70
[ 7677.703106] [<ffffffff817d0445>] schedule_timeout+0x1b5/0x210
[ 7677.703111] [<ffffffff8108e01a>] ? __queue_delayed_work+0xaa/0x1a0
[ 7677.703114] [<ffffffff8108e5f5>] ? try_to_grab_pending+0x65/0x80
[ 7677.703117] [<ffffffff817cebc7>] wait_for_completion+0xa7/0x160
[ 7677.703121] [<ffffffff810a3fa0>] ? try_to_wake_up+0x2a0/0x2a0
[ 7677.703126] [<ffffffff8121d6c6>] writeback_inodes_sb_nr+0x86/0xb0
[ 7677.703143] [<ffffffffc0630b9d>] shrink_delalloc+0x10d/0x300 [btrfs]
[ 7677.703159] [<ffffffffc0630e68>] flush_space+0xd8/0x150 [btrfs]
[ 7677.703173] [<ffffffffc0631763>] ? btrfs_async_reclaim_metadata_space+0x153/0x1d0 [btrfs]
[ 7677.703186] [<ffffffffc063175b>] btrfs_async_reclaim_metadata_space+0x14b/0x1d0 [btrfs]
[ 7677.703189] [<ffffffff8108f6dd>] process_one_work+0x14d/0x460
[ 7677.703192] [<ffffffff810900bb>] worker_thread+0x11b/0x3f0
[ 7677.703196] [<ffffffff8108ffa0>] ? create_worker+0x1e0/0x1e0
[ 7677.703199] [<ffffffff81095cc9>] kthread+0xc9/0xe0
[ 7677.703201] [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 7677.703205] [<ffffffff817d17fc>] ret_from_fork+0x7c/0xb0
[ 7677.703208] [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1415510
Title:
Frequent kernel panics when doing heavy I/O in LXC containers on Btrfs
Status in linux package in Ubuntu:
Confirmed
Bug description:
I initially reported this as a bug in LXC
(https://github.com/lxc/lxc/issues/424), but I was (rightfully)
advised to report this as a kernel issue instead:
I'm running Ubuntu 14.04.1 LTS (x86_64) on my Laptop. Current Kernel
version is "3.13.0-44-generic". The LXC version is "1.0.7-0ubuntu0.1",
installed from the "ubuntu-lxc" PPA on Launchpad.
I have a dedicated Btrfs file system mounted on /container/, which I
use for storing all LXC containers.
The file system is created on top of a logical volume:
lenz@lenz-ThinkPad-T440 ~ % mount | grep container
/dev/mapper/ubuntu--vg-container on /container type btrfs (rw)
lenz@lenz-ThinkPad-T440 ~ % sudo lvdisplay /dev/mapper/ubuntu--vg-container
--- Logical volume ---
LV Path /dev/ubuntu-vg/container
LV Name container
VG Name ubuntu-vg
LV UUID JUq21P-SSoS-UeU5-rdDS-k6V4-d30e-gJM1FA
LV Write Access read/write
LV Creation host, time lenz-ThinkPad-T440, 2014-09-15 13:42:27 +0200
LV Status available
# open 1
LV Size 65,00 GiB
Current LE 16640
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 252:5
The hard disk drive is a Samsung SSD ("Samsung SSD 840 EVO 500GB,
EXT0BB0Q, max UDMA/133", according to dmesg).
I have a number containers based on CentOS 6, these were created by
cloning a base image using lxc-clone -s.
Quite frequently, when I create heavy disk I/O in one or several of
these containers (e.g. by running yum update concurrently, or by
transferring large files e.g. via a HTTP upload to one of the
container instances), my host system freezes. This only happens when
container activity is involved, the system runs stable otherwise. Most
of the time the X desktop freezes, sometimes a Kernel panic can be
observed on the console. Unfortunately I'm unable to capture it other
than by taking a picture. The only solution is to perform a cold
reboot using the power button.
This occurred to me before. I then re-created the /container/ file
system from scratch and started again. But now it's happening again,
so I would like to report it for investigation.
ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-generic 3.13.0.44.51
ProcVersionSignature: Ubuntu 3.13.0-44.73-generic 3.13.11-ckt12
Uname: Linux 3.13.0-44-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.6
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC1: lenz 2782 F.... pulseaudio
/dev/snd/controlC0: lenz 2782 F.... pulseaudio
CurrentDesktop: Unity
Date: Wed Jan 28 16:16:20 2015
HibernationDevice: RESUME=UUID=a60307c3-e53f-473e-ba9e-90cbfe484bb8
InstallationDate: Installed on 2014-09-15 (135 days ago)
InstallationMedia: Ubuntu 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.2)
MachineType: LENOVO 20B6005YGE
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.13.0-44-generic root=/dev/mapper/ubuntu--vg-root ro softlockup_panic=1 elevator=noop quiet splash nomdmonddf nomdmonisw vt.handoff=7
RelatedPackageVersions:
linux-restricted-modules-3.13.0-44-generic N/A
linux-backports-modules-3.13.0-44-generic N/A
linux-firmware 1.127.11
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 09/03/2014
dmi.bios.vendor: LENOVO
dmi.bios.version: GJET79WW (2.29 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20B6005YGE
dmi.board.vendor: LENOVO
dmi.board.version: 0B98401 PRO
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvrGJET79WW(2.29):bd09/03/2014:svnLENOVO:pn20B6005YGE:pvrThinkPadT440:rvnLENOVO:rn20B6005YGE:rvr0B98401PRO:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 20B6005YGE
dmi.product.version: ThinkPad T440
dmi.sys.vendor: LENOVO
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1415510/+subscriptions
References