← Back to team overview

kernel-packages team mailing list archive

[Bug 1415510] Re: Frequent kernel panics when doing heavy I/O in LXC containers on Btrfs

 

Update: using the mainline kernel, I observe a slightly different
pattern. When running multiple heavy I/O operations in parallel (e.g.
rsyncing a large ISO image to a container, performing an http upload
into another one and running "yum update" on all containers), the large
uploads start to stall and come to a crawling halt at some point.

"dmesg" reveals some different btrfs related issues:

[ 6838.005920] INFO: task kworker/u16:0:5815 blocked for more than 120 seconds.
[ 6838.005924]       Not tainted 3.19.0-031900rc6-generic #201501261152
[ 6838.005925] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6838.005926] kworker/u16:0   D ffff88024422bb18     0  5815      2 0x00000000
[ 6838.005953] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
[ 6838.005954]  ffff88024422bb18 ffff88024422bad8 ffff88024422bfd8 00000000000141c0
[ 6838.005956]  ffff88030c1b0700 ffff88021a1e13a0 ffff8802c78a75c0 ffff88024422bb08
[ 6838.005958]  ffff88024422bc88 7fffffffffffffff 7fffffffffffffff ffff8802c78a75c0
[ 6838.005959] Call Trace:
[ 6838.005965]  [<ffffffff817cd6b9>] schedule+0x29/0x70
[ 6838.005968]  [<ffffffff817d0445>] schedule_timeout+0x1b5/0x210
[ 6838.005972]  [<ffffffff8108e01a>] ? __queue_delayed_work+0xaa/0x1a0
[ 6838.005974]  [<ffffffff8108e5db>] ? try_to_grab_pending+0x4b/0x80
[ 6838.005976]  [<ffffffff817cebc7>] wait_for_completion+0xa7/0x160
[ 6838.005979]  [<ffffffff810a3fa0>] ? try_to_wake_up+0x2a0/0x2a0
[ 6838.005983]  [<ffffffff8121d6c6>] writeback_inodes_sb_nr+0x86/0xb0
[ 6838.005997]  [<ffffffffc0630b9d>] shrink_delalloc+0x10d/0x300 [btrfs]
[ 6838.006011]  [<ffffffffc0630e68>] flush_space+0xd8/0x150 [btrfs]
[ 6838.006022]  [<ffffffffc063175b>] btrfs_async_reclaim_metadata_space+0x14b/0x1d0 [btrfs]
[ 6838.006024]  [<ffffffff8108f6dd>] process_one_work+0x14d/0x460
[ 6838.006026]  [<ffffffff810900bb>] worker_thread+0x11b/0x3f0
[ 6838.006029]  [<ffffffff8108ffa0>] ? create_worker+0x1e0/0x1e0
[ 6838.006031]  [<ffffffff81095cc9>] kthread+0xc9/0xe0
[ 6838.006032]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 6838.006035]  [<ffffffff817d17fc>] ret_from_fork+0x7c/0xb0
[ 6838.006037]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 6957.962660] INFO: task kworker/u16:0:5815 blocked for more than 120 seconds.
[ 6957.962667]       Not tainted 3.19.0-031900rc6-generic #201501261152
[ 6957.962668] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6957.962671] kworker/u16:0   D ffff88024422bb18     0  5815      2 0x00000000
[ 6957.962706] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
[ 6957.962709]  ffff88024422bb18 ffff88024422bad8 ffff88024422bfd8 00000000000141c0
[ 6957.962713]  ffff88030c1b0700 ffff88021a1e13a0 ffff8802c78a75c0 ffff88024422bb08
[ 6957.962716]  ffff88024422bc88 7fffffffffffffff 7fffffffffffffff ffff8802c78a75c0
[ 6957.962720] Call Trace:
[ 6957.962741]  [<ffffffff817cd6b9>] schedule+0x29/0x70
[ 6957.962746]  [<ffffffff817d0445>] schedule_timeout+0x1b5/0x210
[ 6957.962752]  [<ffffffff8108e01a>] ? __queue_delayed_work+0xaa/0x1a0
[ 6957.962756]  [<ffffffff8108e5db>] ? try_to_grab_pending+0x4b/0x80
[ 6957.962760]  [<ffffffff817cebc7>] wait_for_completion+0xa7/0x160
[ 6957.962765]  [<ffffffff810a3fa0>] ? try_to_wake_up+0x2a0/0x2a0
[ 6957.962771]  [<ffffffff8121d6c6>] writeback_inodes_sb_nr+0x86/0xb0
[ 6957.962787]  [<ffffffffc0630b9d>] shrink_delalloc+0x10d/0x300 [btrfs]
[ 6957.962803]  [<ffffffffc0630e68>] flush_space+0xd8/0x150 [btrfs]
[ 6957.962817]  [<ffffffffc063175b>] btrfs_async_reclaim_metadata_space+0x14b/0x1d0 [btrfs]
[ 6957.962822]  [<ffffffff8108f6dd>] process_one_work+0x14d/0x460
[ 6957.962826]  [<ffffffff810900bb>] worker_thread+0x11b/0x3f0
[ 6957.962830]  [<ffffffff8108ffa0>] ? create_worker+0x1e0/0x1e0
[ 6957.962834]  [<ffffffff81095cc9>] kthread+0xc9/0xe0
[ 6957.962838]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 6957.962842]  [<ffffffff817d17fc>] ret_from_fork+0x7c/0xb0
[ 6957.962846]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 6962.761961] systemd-hostnamed[15586]: Warning: nss-myhostname is not installed. Changing the local hostname might make it unresolveable. Please install nss-myhostname!
[ 7437.789596] INFO: task yum:14547 blocked for more than 120 seconds.
[ 7437.789600]       Not tainted 3.19.0-031900rc6-generic #201501261152
[ 7437.789601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7437.789602] yum             D ffff880286777868     0 14547  14546 0x00000000
[ 7437.789605]  ffff880286777868 0000000200000001 ffff880286777fd8 00000000000141c0
[ 7437.789607]  ffff88002e07db00 ffffffff81c1c500 ffff8801f8892740 ffff880286777858
[ 7437.789608]  ffff8802867779d8 7fffffffffffffff 7fffffffffffffff ffff8801f8892740
[ 7437.789610] Call Trace:
[ 7437.789616]  [<ffffffff817cd6b9>] schedule+0x29/0x70
[ 7437.789619]  [<ffffffff817d0445>] schedule_timeout+0x1b5/0x210
[ 7437.789623]  [<ffffffff8108e01a>] ? __queue_delayed_work+0xaa/0x1a0
[ 7437.789625]  [<ffffffff8108e5db>] ? try_to_grab_pending+0x4b/0x80
[ 7437.789628]  [<ffffffff817cebc7>] wait_for_completion+0xa7/0x160
[ 7437.789634]  [<ffffffff810a3fa0>] ? try_to_wake_up+0x2a0/0x2a0
[ 7437.789638]  [<ffffffff8121d6c6>] writeback_inodes_sb_nr+0x86/0xb0
[ 7437.789674]  [<ffffffffc0630b9d>] shrink_delalloc+0x10d/0x300 [btrfs]
[ 7437.789692]  [<ffffffffc0628cbd>] ? get_alloc_profile+0x5d/0x90 [btrfs]
[ 7437.789707]  [<ffffffffc06304c0>] ? btrfs_get_alloc_profile+0x30/0x40 [btrfs]
[ 7437.789719]  [<ffffffffc0630e68>] flush_space+0xd8/0x150 [btrfs]
[ 7437.789731]  [<ffffffffc06310b9>] reserve_metadata_bytes+0x1d9/0x590 [btrfs]
[ 7437.789743]  [<ffffffffc0624659>] ? btrfs_search_slot+0x3a9/0x870 [btrfs]
[ 7437.789760]  [<ffffffffc0664d50>] ? set_state_bits+0x40/0x80 [btrfs]
[ 7437.789773]  [<ffffffffc06320f5>] btrfs_block_rsv_add+0x35/0x60 [btrfs]
[ 7437.789788]  [<ffffffffc065fff2>] ? try_merge_map+0x32/0x150 [btrfs]
[ 7437.789801]  [<ffffffffc0649e15>] start_transaction.part.35+0x185/0x540 [btrfs]
[ 7437.789813]  [<ffffffffc064a1f9>] start_transaction+0x29/0x30 [btrfs]
[ 7437.789824]  [<ffffffffc064a53b>] btrfs_start_transaction+0x1b/0x20 [btrfs]
[ 7437.789837]  [<ffffffffc065344a>] maybe_insert_hole+0x8a/0x1b0 [btrfs]
[ 7437.789847]  [<ffffffffc0655452>] btrfs_cont_expand+0x1c2/0x340 [btrfs]
[ 7437.789857]  [<ffffffffc065f310>] btrfs_file_write_iter+0x2e0/0x360 [btrfs]
[ 7437.789859]  [<ffffffff811f408b>] new_sync_write+0x7b/0xb0
[ 7437.789861]  [<ffffffff811f4f07>] vfs_write+0xc7/0x1f0
[ 7437.789862]  [<ffffffff811f52af>] SyS_write+0x4f/0xb0
[ 7437.789865]  [<ffffffff817cd6b9>] ? schedule+0x29/0x70
[ 7437.789867]  [<ffffffff817d18ad>] system_call_fastpath+0x16/0x1b
[ 7677.703046] INFO: task kworker/u16:10:16126 blocked for more than 120 seconds.
[ 7677.703051]       Not tainted 3.19.0-031900rc6-generic #201501261152
[ 7677.703053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7677.703055] kworker/u16:10  D ffff88010fdabb18     0 16126      2 0x00000000
[ 7677.703086] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
[ 7677.703088]  ffff88010fdabb18 01000000000000a1 ffff88010fdabfd8 00000000000141c0
[ 7677.703091]  ffff88002e07f700 ffff8802443075c0 ffff880209aff5c0 ffff88010fdabb08
[ 7677.703094]  ffff88010fdabc88 7fffffffffffffff 7fffffffffffffff ffff880209aff5c0
[ 7677.703097] Call Trace:
[ 7677.703103]  [<ffffffff817cd6b9>] schedule+0x29/0x70
[ 7677.703106]  [<ffffffff817d0445>] schedule_timeout+0x1b5/0x210
[ 7677.703111]  [<ffffffff8108e01a>] ? __queue_delayed_work+0xaa/0x1a0
[ 7677.703114]  [<ffffffff8108e5f5>] ? try_to_grab_pending+0x65/0x80
[ 7677.703117]  [<ffffffff817cebc7>] wait_for_completion+0xa7/0x160
[ 7677.703121]  [<ffffffff810a3fa0>] ? try_to_wake_up+0x2a0/0x2a0
[ 7677.703126]  [<ffffffff8121d6c6>] writeback_inodes_sb_nr+0x86/0xb0
[ 7677.703143]  [<ffffffffc0630b9d>] shrink_delalloc+0x10d/0x300 [btrfs]
[ 7677.703159]  [<ffffffffc0630e68>] flush_space+0xd8/0x150 [btrfs]
[ 7677.703173]  [<ffffffffc0631763>] ? btrfs_async_reclaim_metadata_space+0x153/0x1d0 [btrfs]
[ 7677.703186]  [<ffffffffc063175b>] btrfs_async_reclaim_metadata_space+0x14b/0x1d0 [btrfs]
[ 7677.703189]  [<ffffffff8108f6dd>] process_one_work+0x14d/0x460
[ 7677.703192]  [<ffffffff810900bb>] worker_thread+0x11b/0x3f0
[ 7677.703196]  [<ffffffff8108ffa0>] ? create_worker+0x1e0/0x1e0
[ 7677.703199]  [<ffffffff81095cc9>] kthread+0xc9/0xe0
[ 7677.703201]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 7677.703205]  [<ffffffff817d17fc>] ret_from_fork+0x7c/0xb0
[ 7677.703208]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1415510

Title:
  Frequent kernel panics when doing heavy I/O in LXC containers on Btrfs

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I initially reported this as a bug in LXC
  (https://github.com/lxc/lxc/issues/424), but I was (rightfully)
  advised to report this as a kernel issue instead:

  I'm running Ubuntu 14.04.1 LTS (x86_64) on my Laptop. Current Kernel
  version is "3.13.0-44-generic". The LXC version is "1.0.7-0ubuntu0.1",
  installed from the "ubuntu-lxc" PPA on Launchpad.

  I have a dedicated Btrfs file system mounted on /container/, which I
  use for storing all LXC containers.

  The file system is created on top of a logical volume:

  lenz@lenz-ThinkPad-T440 ~ % mount | grep container
  /dev/mapper/ubuntu--vg-container on /container type btrfs (rw)
  lenz@lenz-ThinkPad-T440 ~ % sudo lvdisplay /dev/mapper/ubuntu--vg-container
    --- Logical volume ---
    LV Path                /dev/ubuntu-vg/container
    LV Name                container
    VG Name                ubuntu-vg
    LV UUID                JUq21P-SSoS-UeU5-rdDS-k6V4-d30e-gJM1FA
    LV Write Access        read/write
    LV Creation host, time lenz-ThinkPad-T440, 2014-09-15 13:42:27 +0200
    LV Status              available
    # open                 1
    LV Size                65,00 GiB
    Current LE             16640
    Segments               1
    Allocation             inherit
    Read ahead sectors     auto
    - currently set to     256
    Block device           252:5

  The hard disk drive is a Samsung SSD ("Samsung SSD 840 EVO 500GB,
  EXT0BB0Q, max UDMA/133", according to dmesg).

  I have a number containers based on CentOS 6, these were created by
  cloning a base image using lxc-clone -s.

  Quite frequently, when I create heavy disk I/O in one or several of
  these containers (e.g. by running yum update concurrently, or by
  transferring large files e.g. via a HTTP upload to one of the
  container instances), my host system freezes. This only happens when
  container activity is involved, the system runs stable otherwise. Most
  of the time the X desktop freezes, sometimes a Kernel panic can be
  observed on the console. Unfortunately I'm unable to capture it other
  than by taking a picture. The only solution is to perform a cold
  reboot using the power button.

  This occurred to me before. I then re-created the /container/ file
  system from scratch and started again. But now it's happening again,
  so I would like to report it for investigation.

  ProblemType: Bug
  DistroRelease: Ubuntu 14.04
  Package: linux-image-generic 3.13.0.44.51
  ProcVersionSignature: Ubuntu 3.13.0-44.73-generic 3.13.11-ckt12
  Uname: Linux 3.13.0-44-generic x86_64
  ApportVersion: 2.14.1-0ubuntu3.6
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC1:  lenz       2782 F.... pulseaudio
   /dev/snd/controlC0:  lenz       2782 F.... pulseaudio
  CurrentDesktop: Unity
  Date: Wed Jan 28 16:16:20 2015
  HibernationDevice: RESUME=UUID=a60307c3-e53f-473e-ba9e-90cbfe484bb8
  InstallationDate: Installed on 2014-09-15 (135 days ago)
  InstallationMedia: Ubuntu 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.2)
  MachineType: LENOVO 20B6005YGE
  ProcFB: 0 inteldrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.13.0-44-generic root=/dev/mapper/ubuntu--vg-root ro softlockup_panic=1 elevator=noop quiet splash nomdmonddf nomdmonisw vt.handoff=7
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-44-generic N/A
   linux-backports-modules-3.13.0-44-generic  N/A
   linux-firmware                             1.127.11
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 09/03/2014
  dmi.bios.vendor: LENOVO
  dmi.bios.version: GJET79WW (2.29 )
  dmi.board.asset.tag: Not Available
  dmi.board.name: 20B6005YGE
  dmi.board.vendor: LENOVO
  dmi.board.version: 0B98401 PRO
  dmi.chassis.asset.tag: No Asset Information
  dmi.chassis.type: 10
  dmi.chassis.vendor: LENOVO
  dmi.chassis.version: Not Available
  dmi.modalias: dmi:bvnLENOVO:bvrGJET79WW(2.29):bd09/03/2014:svnLENOVO:pn20B6005YGE:pvrThinkPadT440:rvnLENOVO:rn20B6005YGE:rvr0B98401PRO:cvnLENOVO:ct10:cvrNotAvailable:
  dmi.product.name: 20B6005YGE
  dmi.product.version: ThinkPad T440
  dmi.sys.vendor: LENOVO

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1415510/+subscriptions


References