kernel-packages team mailing list archive

Thread
Date
[Bug 1598285] [NEW] possible deadlock while using the cgroup freezer on a container with NFS-based workload

To: kernel-packages@xxxxxxxxxxxxxxxxxxx
From: Tycho Andersen <tycho.andersen@xxxxxxxxxxxxx>
Date: Fri, 01 Jul 2016 19:49:14 -0000
Reply-to: Bug 1598285 <1598285@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Public bug reported:

Hi guys,

For background: I'm running a container with an NFS filesystem bind
mounted into it. The workload I'm running is iozone, a filesystem
benchmarking tool. While running this workload, I attempt to freeze the
container, which gets stuck in the FREEZING state. After a while, I get:

Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.104156] INFO: task iozone:20035 blocked for more than 120 seconds.
Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.111056]       Tainted: P           O    4.4.0-24-generic #43-Ubuntu
Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.118053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126110] iozone          D ffff880015673e18     0 20035  20005 0x00000104
Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126116]  ffff880015673e18 ffff880000000010 ffff880045a21b80 ffff880037776e00
Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126118]  ffff880015674000 ffff8800179d6e54 ffff880037776e00 00000000ffffffff
Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126120]  ffff8800179d6e58 ffff880015673e30 ffffffff81821b15 ffff8800179d6e50
Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126121] Call Trace:
Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126129]  [<ffffffff81821b15>] schedule+0x35/0x80
Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126131]  [<ffffffff81821dbe>] schedule_preempt_disabled+0xe/0x10
Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126134]  [<ffffffff818239f9>] __mutex_lock_slowpath+0xb9/0x130
Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126136]  [<ffffffff81823a8f>] mutex_lock+0x1f/0x30
Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126139]  [<ffffffff8121d00b>] do_unlinkat+0x12b/0x2d0
Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126142]  [<ffffffff8121dc16>] SyS_unlink+0x16/0x20
Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126146]  [<ffffffff81825bf2>] entry_SYSCALL_64_fastpath+0x16/0x71

It looks like the task is actually stuck in generic fs code, not
anything NFS specific, but perhaps that's a relevant detail. Anyway:

ubuntu@juju-19f8e3-15:~$ sudo cat /proc/20035/stack
[<ffffffff8121d00b>] do_unlinkat+0x12b/0x2d0
[<ffffffff8121dc16>] SyS_unlink+0x16/0x20
[<ffffffff81825bf2>] entry_SYSCALL_64_fastpath+0x16/0x71
[<ffffffffffffffff>] 0xffffffffffffffff

The container and host are both xenial:

ubuntu@juju-19f8e3-15:~$ uname -a
Linux juju-19f8e3-15 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Finally, I don't have a good reproducer for this. It's pretty rare, as
I'm running this benchmark in a loop, and over thousands of runs I've
seen this exactly once.

I'll leave these hosts up for a bit if there's any other interesting
bits of info to collect.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1598285

Title:
  possible deadlock while using the cgroup freezer on a container with
  NFS-based workload

Status in linux package in Ubuntu:
  New

Bug description:
  Hi guys,

  For background: I'm running a container with an NFS filesystem bind
  mounted into it. The workload I'm running is iozone, a filesystem
  benchmarking tool. While running this workload, I attempt to freeze
  the container, which gets stuck in the FREEZING state. After a while,
  I get:

  Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.104156] INFO: task iozone:20035 blocked for more than 120 seconds.
  Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.111056]       Tainted: P           O    4.4.0-24-generic #43-Ubuntu
  Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.118053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126110] iozone          D ffff880015673e18     0 20035  20005 0x00000104
  Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126116]  ffff880015673e18 ffff880000000010 ffff880045a21b80 ffff880037776e00
  Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126118]  ffff880015674000 ffff8800179d6e54 ffff880037776e00 00000000ffffffff
  Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126120]  ffff8800179d6e58 ffff880015673e30 ffffffff81821b15 ffff8800179d6e50
  Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126121] Call Trace:
  Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126129]  [<ffffffff81821b15>] schedule+0x35/0x80
  Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126131]  [<ffffffff81821dbe>] schedule_preempt_disabled+0xe/0x10
  Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126134]  [<ffffffff818239f9>] __mutex_lock_slowpath+0xb9/0x130
  Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126136]  [<ffffffff81823a8f>] mutex_lock+0x1f/0x30
  Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126139]  [<ffffffff8121d00b>] do_unlinkat+0x12b/0x2d0
  Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126142]  [<ffffffff8121dc16>] SyS_unlink+0x16/0x20
  Jul  1 01:45:14 juju-19f8e3-15 kernel: [206520.126146]  [<ffffffff81825bf2>] entry_SYSCALL_64_fastpath+0x16/0x71

  It looks like the task is actually stuck in generic fs code, not
  anything NFS specific, but perhaps that's a relevant detail. Anyway:

  ubuntu@juju-19f8e3-15:~$ sudo cat /proc/20035/stack
  [<ffffffff8121d00b>] do_unlinkat+0x12b/0x2d0
  [<ffffffff8121dc16>] SyS_unlink+0x16/0x20
  [<ffffffff81825bf2>] entry_SYSCALL_64_fastpath+0x16/0x71
  [<ffffffffffffffff>] 0xffffffffffffffff

  The container and host are both xenial:

  ubuntu@juju-19f8e3-15:~$ uname -a
  Linux juju-19f8e3-15 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

  Finally, I don't have a good reproducer for this. It's pretty rare, as
  I'm running this benchmark in a loop, and over thousands of runs I've
  seen this exactly once.

  I'll leave these hosts up for a bit if there's any other interesting
  bits of info to collect.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1598285/+subscriptions
Follow ups

[Bug 1598285] Re: possible deadlock while using the cgroup freezer on a container with NFS-based workload
From: Seth Forshee, 2016-07-01
[Bug 1598285] Missing required logs.
From: Brad Figg, 2016-07-01