← Back to team overview

kernel-packages team mailing list archive

[Bug 1527062] [NEW] XFS Deadlock on 4.2+

 

Public bug reported:

[Impact]

 * An XFS Deadlock situation is possible on 4.2..4.4rc1^ and newer
kernels.

 * Hung tasks have stack traces similar to 
[ 4559.110607] INFO: task kworker/1:0:17 blocked for more than 120 seconds.
[ 4559.143010]       Not tainted 4.2.0-18-generic #22~14.04.1-Ubuntu
[ 4559.171972] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4559.209753] kworker/1:0     D 0000000000000000     0    17      2 0x00000000
[ 4559.209791] Workqueue: xfs-cil/sdac1 xlog_cil_push_work [xfs]
[ 4559.209794]  ffff88085be9fbb8 0000000000000046 ffff88085b746040 ffff88085be8a940
[ 4559.209795]  0000000000000000 ffff88085bea0000 ffff880107fddcc0 ffff88085be8a940
[ 4559.209797]  ffff880859119c00 ffff880859119d00 ffff88085be9fbd8 ffffffff817b6a77
[ 4559.209798] Call Trace:
[ 4559.209806]  [<ffffffff817b6a77>] schedule+0x37/0x80
[ 4559.209817]  [<ffffffffc03c105b>] xlog_state_get_iclog_space+0xdb/0x2d0 [xfs]
[ 4559.209822]  [<ffffffff810a06c0>] ? wake_up_q+0x80/0x80
[ 4559.209832]  [<ffffffffc03c1501>] xlog_write+0x191/0x6a0 [xfs]
[ 4559.209835]  [<ffffffff813b4478>] ? prandom_u32+0x18/0x20
[ 4559.209845]  [<ffffffffc03c2e49>] xlog_cil_push+0x1f9/0x3b0 [xfs]
[ 4559.209854]  [<ffffffffc03c3015>] xlog_cil_push_work+0x15/0x20 [xfs]
[ 4559.209857]  [<ffffffff8108f4ce>] process_one_work+0x14e/0x3d0
[ 4559.209858]  [<ffffffff8108fb7a>] worker_thread+0x11a/0x470
[ 4559.209860]  [<ffffffff8108fa60>] ? rescuer_thread+0x310/0x310
[ 4559.209862]  [<ffffffff81095112>] kthread+0xd2/0xf0
[ 4559.209863]  [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0
[ 4559.209865]  [<ffffffff817ba71f>] ret_from_fork+0x3f/0x70
[ 4559.209866]  [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0

or

[305651.804853] INFO: task kswapd0:194 blocked for more than 120 seconds.
[305651.836092]       Not tainted 4.2.0-18-generic #22~14.04.1-Ubuntu
[305651.865655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[305651.903596] kswapd0         D ffff88085fa96640     0   194      2 0x00000000
[305651.903614]  ffff8810591ab858 0000000000000046 ffff88085c2c2940 ffff88105b19a940
[305651.903616]  ffff880066c64548 ffff8810591ac000 ffff8808599cae18 0000000000000000
[305651.903618]  ffff88105b19a940 ffff88085a2cb000 ffff8810591ab878 ffffffff817b6a77
[305651.903620] Call Trace:
[305651.903629]  [<ffffffff817b6a77>] schedule+0x37/0x80
[305651.903655]  [<ffffffffc0402f6c>] _xfs_log_force_lsn+0x15c/0x2d0 [xfs]
[305651.903662]  [<ffffffff810a06c0>] ? wake_up_q+0x80/0x80
[305651.903675]  [<ffffffffc040310e>] xfs_log_force_lsn+0x2e/0x80 [xfs]
[305651.903687]  [<ffffffffc03f5ff9>] ? xfs_iunpin_wait+0x19/0x20 [xfs]
[305651.903698]  [<ffffffffc03f2a3d>] __xfs_iunpin_wait+0x8d/0x120 [xfs]
[305651.903701]  [<ffffffff810b7380>] ? autoremove_wake_function+0x40/0x40
[305651.903711]  [<ffffffffc03f5ff9>] xfs_iunpin_wait+0x19/0x20 [xfs]
[305651.903721]  [<ffffffffc03eb665>] xfs_reclaim_inode+0x125/0x330 [xfs]
[305651.903732]  [<ffffffffc03ebab8>] xfs_reclaim_inodes_ag+0x248/0x360 [xfs]
[305651.903735]  [<ffffffff8120511c>] ? destroy_inode+0x3c/0x60
[305651.903744]  [<ffffffffc03ec573>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
[305651.903755]  [<ffffffffc03fa5e9>] xfs_fs_free_cached_objects+0x19/0x20 [xfs]
[305651.903758]  [<ffffffff811ee2a1>] super_cache_scan+0x181/0x190
[305651.903761]  [<ffffffff811870e6>] shrink_slab+0x206/0x380
[305651.903763]  [<ffffffff8118b7a1>] shrink_zone+0x291/0x2b0
[305651.903764]  [<ffffffff8118c710>] kswapd+0x500/0x9b0
[305651.903766]  [<ffffffff8118c210>] ? mem_cgroup_shrink_node_zone+0x130/0x130
[305651.903768]  [<ffffffff81095112>] kthread+0xd2/0xf0
[305651.903770]  [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0
[305651.903772]  [<ffffffff817ba71f>] ret_from_fork+0x3f/0x70
[305651.903774]  [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0


[Test Case]

 * Large numbers of IO tasks to large numbers of XFS fileystems while
under memory pressure.  Testcase may not be guaranteed.

[Regression Potential]

 * Upstream commit 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7a29ac474a47eb8cf212b45917683ae89d6fa13b
 - This commit allocates rescuer threads for each of the XFS work queues.

 * Possible additional memory usage from rescuer threads.

[Other Info]
 
 * Anything else you think is useful to include
 * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board
 * and address these questions in advance

** Affects: linux (Ubuntu)
     Importance: High
     Assignee: Dave Chiluk (chiluk)
         Status: In Progress

** Affects: linux-lts-wily (Ubuntu)
     Importance: High
     Assignee: Dave Chiluk (chiluk)
         Status: In Progress


** Tags: sts

** Also affects: linux-lts-wily (Ubuntu)
   Importance: Undecided
       Status: New

** Changed in: linux-lts-wily (Ubuntu)
     Assignee: (unassigned) => Dave Chiluk (chiluk)

** Changed in: linux-lts-wily (Ubuntu)
   Importance: Undecided => High

** Changed in: linux-lts-wily (Ubuntu)
       Status: New => Confirmed

** Changed in: linux (Ubuntu)
       Status: Triaged => Won't Fix

** Changed in: linux (Ubuntu)
       Status: Won't Fix => In Progress

** Changed in: linux-lts-wily (Ubuntu)
       Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1527062

Title:
  XFS Deadlock on 4.2+

Status in linux package in Ubuntu:
  In Progress
Status in linux-lts-wily package in Ubuntu:
  In Progress

Bug description:
  [Impact]

   * An XFS Deadlock situation is possible on 4.2..4.4rc1^ and newer
  kernels.

   * Hung tasks have stack traces similar to 
  [ 4559.110607] INFO: task kworker/1:0:17 blocked for more than 120 seconds.
  [ 4559.143010]       Not tainted 4.2.0-18-generic #22~14.04.1-Ubuntu
  [ 4559.171972] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 4559.209753] kworker/1:0     D 0000000000000000     0    17      2 0x00000000
  [ 4559.209791] Workqueue: xfs-cil/sdac1 xlog_cil_push_work [xfs]
  [ 4559.209794]  ffff88085be9fbb8 0000000000000046 ffff88085b746040 ffff88085be8a940
  [ 4559.209795]  0000000000000000 ffff88085bea0000 ffff880107fddcc0 ffff88085be8a940
  [ 4559.209797]  ffff880859119c00 ffff880859119d00 ffff88085be9fbd8 ffffffff817b6a77
  [ 4559.209798] Call Trace:
  [ 4559.209806]  [<ffffffff817b6a77>] schedule+0x37/0x80
  [ 4559.209817]  [<ffffffffc03c105b>] xlog_state_get_iclog_space+0xdb/0x2d0 [xfs]
  [ 4559.209822]  [<ffffffff810a06c0>] ? wake_up_q+0x80/0x80
  [ 4559.209832]  [<ffffffffc03c1501>] xlog_write+0x191/0x6a0 [xfs]
  [ 4559.209835]  [<ffffffff813b4478>] ? prandom_u32+0x18/0x20
  [ 4559.209845]  [<ffffffffc03c2e49>] xlog_cil_push+0x1f9/0x3b0 [xfs]
  [ 4559.209854]  [<ffffffffc03c3015>] xlog_cil_push_work+0x15/0x20 [xfs]
  [ 4559.209857]  [<ffffffff8108f4ce>] process_one_work+0x14e/0x3d0
  [ 4559.209858]  [<ffffffff8108fb7a>] worker_thread+0x11a/0x470
  [ 4559.209860]  [<ffffffff8108fa60>] ? rescuer_thread+0x310/0x310
  [ 4559.209862]  [<ffffffff81095112>] kthread+0xd2/0xf0
  [ 4559.209863]  [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0
  [ 4559.209865]  [<ffffffff817ba71f>] ret_from_fork+0x3f/0x70
  [ 4559.209866]  [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0

  or

  [305651.804853] INFO: task kswapd0:194 blocked for more than 120 seconds.
  [305651.836092]       Not tainted 4.2.0-18-generic #22~14.04.1-Ubuntu
  [305651.865655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [305651.903596] kswapd0         D ffff88085fa96640     0   194      2 0x00000000
  [305651.903614]  ffff8810591ab858 0000000000000046 ffff88085c2c2940 ffff88105b19a940
  [305651.903616]  ffff880066c64548 ffff8810591ac000 ffff8808599cae18 0000000000000000
  [305651.903618]  ffff88105b19a940 ffff88085a2cb000 ffff8810591ab878 ffffffff817b6a77
  [305651.903620] Call Trace:
  [305651.903629]  [<ffffffff817b6a77>] schedule+0x37/0x80
  [305651.903655]  [<ffffffffc0402f6c>] _xfs_log_force_lsn+0x15c/0x2d0 [xfs]
  [305651.903662]  [<ffffffff810a06c0>] ? wake_up_q+0x80/0x80
  [305651.903675]  [<ffffffffc040310e>] xfs_log_force_lsn+0x2e/0x80 [xfs]
  [305651.903687]  [<ffffffffc03f5ff9>] ? xfs_iunpin_wait+0x19/0x20 [xfs]
  [305651.903698]  [<ffffffffc03f2a3d>] __xfs_iunpin_wait+0x8d/0x120 [xfs]
  [305651.903701]  [<ffffffff810b7380>] ? autoremove_wake_function+0x40/0x40
  [305651.903711]  [<ffffffffc03f5ff9>] xfs_iunpin_wait+0x19/0x20 [xfs]
  [305651.903721]  [<ffffffffc03eb665>] xfs_reclaim_inode+0x125/0x330 [xfs]
  [305651.903732]  [<ffffffffc03ebab8>] xfs_reclaim_inodes_ag+0x248/0x360 [xfs]
  [305651.903735]  [<ffffffff8120511c>] ? destroy_inode+0x3c/0x60
  [305651.903744]  [<ffffffffc03ec573>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
  [305651.903755]  [<ffffffffc03fa5e9>] xfs_fs_free_cached_objects+0x19/0x20 [xfs]
  [305651.903758]  [<ffffffff811ee2a1>] super_cache_scan+0x181/0x190
  [305651.903761]  [<ffffffff811870e6>] shrink_slab+0x206/0x380
  [305651.903763]  [<ffffffff8118b7a1>] shrink_zone+0x291/0x2b0
  [305651.903764]  [<ffffffff8118c710>] kswapd+0x500/0x9b0
  [305651.903766]  [<ffffffff8118c210>] ? mem_cgroup_shrink_node_zone+0x130/0x130
  [305651.903768]  [<ffffffff81095112>] kthread+0xd2/0xf0
  [305651.903770]  [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0
  [305651.903772]  [<ffffffff817ba71f>] ret_from_fork+0x3f/0x70
  [305651.903774]  [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0

  
  [Test Case]

   * Large numbers of IO tasks to large numbers of XFS fileystems while
  under memory pressure.  Testcase may not be guaranteed.

  [Regression Potential]

   * Upstream commit 
  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7a29ac474a47eb8cf212b45917683ae89d6fa13b
   - This commit allocates rescuer threads for each of the XFS work queues.

   * Possible additional memory usage from rescuer threads.

  [Other Info]
   
   * Anything else you think is useful to include
   * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board
   * and address these questions in advance

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1527062/+subscriptions


Follow ups