← Back to team overview

kernel-packages team mailing list archive

[Bug 1466135] Re: nf_conntrack releases a conntrack with non-zero refcnt

 

** Description changed:

  [Impact]
  Occasionally starting new containers or creating new net namespaces may soft lockup because of improper refcounting of conntrack entires.
  
- Softlockup backtrace:
- [<ffffffff81723b69>] schedule_preempt_disabled+0x29/0x70
- [<ffffffff817259d5>] __mutex_lock_slowpath+0x135/0x1b0
- [<ffffffff811a2679>] ? __kmalloc+0x1e9/0x230
- [<ffffffff81725a6f>] mutex_lock+0x1f/0x2f
- [<ffffffff8161c2c1>] copy_net_ns+0x71/0x130
- [<ffffffff8108f889>] create_new_namespaces+0xf9/0x180
- [<ffffffff8108f983>] copy_namespaces+0x73/0xa0
- [<ffffffff81065b16>] copy_process.part.26+0x9a6/0x16b0
- [<ffffffff810669f5>] do_fork+0xd5/0x340
- [<ffffffff810c8e8d>] ? call_rcu_sched+0x1d/0x20
- [<ffffffff81066ce6>] SyS_clone+0x16/0x20
- [<ffffffff81730089>] stub_clone+0x69/0x90
- [<ffffffff8172fd2d>] ? system_call_fastpath+0x1a/0x1f
+ In the issue that I face, I can find a kworker thread using up an entire
+ core, and when I cat /proc/$pid/stack I see this:
+ 
+ <ffffffffbe01e9b6>] ___preempt_schedule+0x56/0xb0
+ [<ffffffffc02223e4>] nf_ct_iterate_cleanup+0x134/0x160 [nf_conntrack]
+ [<ffffffffc0223dae>] nf_conntrack_cleanup_net_list+0x4e/0x170
+ [nf_conntrack]
+ [<ffffffffc022436d>] nf_conntrack_pernet_exit+0x4d/0x60 [nf_conntrack]
+ [<ffffffffbe6040d3>] ops_exit_list.isra.1+0x53/0x60
+ [<ffffffffbe6048d0>] cleanup_net+0x100/0x1d0
+ [<ffffffffbe084991>] process_one_work+0x171/0x470
+ [<ffffffffbe08563b>] worker_thread+0x11b/0x3a0
+ [<ffffffffbe08bb82>] kthread+0xd2/0xf0
+ [<ffffffffbe71757c>] ret_from_fork+0x7c/0xb0
+ [<ffffffffffffffff>] 0xffffffffffffffff
+ 
+ The kworker is looping forever and failing to clean up conntrack state.
+ All the while, it holds the global netns lock. Given that I've bisected
+ to commit e53376bef2cd97d3e3f61fdc677fb8da7d03d0da which is to do with refcounting, I suspect that borked refcounting on conntrack entries makes them impossible to properly free/destroy, which prevents this worker from cleaning up the namespace, which then goes on to prevent anything else from interacting with namespaces (add/delete/etc).
  
  [Test Case]
  bug 1403152 has a testcase which can occasionally hit this issue
  
  [Fix]
  $ git describe --contains e53376bef2cd97d3e3f61fdc677fb8da7d03d0da
  v3.14-rc3~36^2~28^2~12

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1466135

Title:
  nf_conntrack releases a conntrack with non-zero refcnt

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  In Progress

Bug description:
  [Impact]
  Occasionally starting new containers or creating new net namespaces may soft lockup because of improper refcounting of conntrack entires.

  In the issue that I face, I can find a kworker thread using up an
  entire core, and when I cat /proc/$pid/stack I see this:

  <ffffffffbe01e9b6>] ___preempt_schedule+0x56/0xb0
  [<ffffffffc02223e4>] nf_ct_iterate_cleanup+0x134/0x160 [nf_conntrack]
  [<ffffffffc0223dae>] nf_conntrack_cleanup_net_list+0x4e/0x170
  [nf_conntrack]
  [<ffffffffc022436d>] nf_conntrack_pernet_exit+0x4d/0x60 [nf_conntrack]
  [<ffffffffbe6040d3>] ops_exit_list.isra.1+0x53/0x60
  [<ffffffffbe6048d0>] cleanup_net+0x100/0x1d0
  [<ffffffffbe084991>] process_one_work+0x171/0x470
  [<ffffffffbe08563b>] worker_thread+0x11b/0x3a0
  [<ffffffffbe08bb82>] kthread+0xd2/0xf0
  [<ffffffffbe71757c>] ret_from_fork+0x7c/0xb0
  [<ffffffffffffffff>] 0xffffffffffffffff

  The kworker is looping forever and failing to clean up conntrack state.
  All the while, it holds the global netns lock. Given that I've bisected
  to commit e53376bef2cd97d3e3f61fdc677fb8da7d03d0da which is to do with refcounting, I suspect that borked refcounting on conntrack entries makes them impossible to properly free/destroy, which prevents this worker from cleaning up the namespace, which then goes on to prevent anything else from interacting with namespaces (add/delete/etc).

  [Test Case]
  bug 1403152 has a testcase which can occasionally hit this issue

  [Fix]
  $ git describe --contains e53376bef2cd97d3e3f61fdc677fb8da7d03d0da
  v3.14-rc3~36^2~28^2~12

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1466135/+subscriptions


References