← Back to team overview

kernel-packages team mailing list archive

[Bug 1458045] Re: KVM and CFS bandwidth control causes kernel crashes (oops)

 

Just wanted to add this as it might be useful (I'm trying to do
troubleshooting):

crash> dis -rl ffffffff810af2f8
/build/buildd/linux-lts-utopic-3.16.0/kernel/sched/fair.c: 4745
0xffffffff810af280 <pick_next_task_fair>:       nopl   0x0(%rax,%rax,1)
0xffffffff810af285 <pick_next_task_fair+5>:     push   %rbp
/build/buildd/linux-lts-utopic-3.16.0/kernel/sched/fair.c: 4746
0xffffffff810af286 <pick_next_task_fair+6>:     lea    0x80(%rdi),%rax
/build/buildd/linux-lts-utopic-3.16.0/kernel/sched/fair.c: 4745
0xffffffff810af28d <pick_next_task_fair+13>:    mov    %rsp,%rbp
0xffffffff810af290 <pick_next_task_fair+16>:    push   %r15
0xffffffff810af292 <pick_next_task_fair+18>:    mov    %rdi,%r15
0xffffffff810af295 <pick_next_task_fair+21>:    push   %r14
0xffffffff810af297 <pick_next_task_fair+23>:    push   %r13
0xffffffff810af299 <pick_next_task_fair+25>:    push   %r12
0xffffffff810af29b <pick_next_task_fair+27>:    push   %rbx
0xffffffff810af29c <pick_next_task_fair+28>:    sub    $0x30,%rsp
/build/buildd/linux-lts-utopic-3.16.0/kernel/sched/fair.c: 4746
0xffffffff810af2a0 <pick_next_task_fair+32>:    mov    %rax,-0x58(%rbp)
/build/buildd/linux-lts-utopic-3.16.0/kernel/sched/fair.c: 4745
0xffffffff810af2a4 <pick_next_task_fair+36>:    mov    %rsi,-0x48(%rbp)
/build/buildd/linux-lts-utopic-3.16.0/kernel/sched/fair.c: 6787
0xffffffff810af2a8 <pick_next_task_fair+40>:    mov    $0x130c0,%rax
0xffffffff810af2af <pick_next_task_fair+47>:    mov    %rax,-0x50(%rbp)
/build/buildd/linux-lts-utopic-3.16.0/kernel/sched/fair.c: 4753
0xffffffff810af2b3 <pick_next_task_fair+51>:    mov    0x90(%r15),%edi
0xffffffff810af2ba <pick_next_task_fair+58>:    test   %edi,%edi
0xffffffff810af2bc <pick_next_task_fair+60>:    je     0xffffffff810af318 <pick_next_task_fair+152>
/build/buildd/linux-lts-utopic-3.16.0/kernel/sched/fair.c: 4756
0xffffffff810af2be <pick_next_task_fair+62>:    mov    -0x48(%rbp),%rax
0xffffffff810af2c2 <pick_next_task_fair+66>:    mov    0x60(%rax),%rax
0xffffffff810af2c6 <pick_next_task_fair+70>:    cmp    $0xffffffff818147e0,%rax
0xffffffff810af2cc <pick_next_task_fair+76>:    je     0xffffffff810af790 <pick_next_task_fair+1296>
/build/buildd/linux-lts-utopic-3.16.0/kernel/sched/sched.h: 1160
0xffffffff810af2d2 <pick_next_task_fair+82>:    mov    -0x48(%rbp),%rsi
0xffffffff810af2d6 <pick_next_task_fair+86>:    mov    %r15,%rdi
0xffffffff810af2d9 <pick_next_task_fair+89>:    callq  *0x38(%rax)
0xffffffff810af2dc <pick_next_task_fair+92>:    mov    -0x58(%rbp),%r12
/build/buildd/linux-lts-utopic-3.16.0/kernel/sched/fair.c: 4835
0xffffffff810af2e0 <pick_next_task_fair+96>:    xor    %esi,%esi
0xffffffff810af2e2 <pick_next_task_fair+98>:    mov    %r12,%rdi
0xffffffff810af2e5 <pick_next_task_fair+101>:   callq  0xffffffff810a66e0 <pick_next_entity>
/build/buildd/linux-lts-utopic-3.16.0/kernel/sched/fair.c: 4836
0xffffffff810af2ea <pick_next_task_fair+106>:   mov    %r12,%rdi
/build/buildd/linux-lts-utopic-3.16.0/kernel/sched/fair.c: 4835
0xffffffff810af2ed <pick_next_task_fair+109>:   mov    %rax,%rbx
/build/buildd/linux-lts-utopic-3.16.0/kernel/sched/fair.c: 4836
0xffffffff810af2f0 <pick_next_task_fair+112>:   mov    %rax,%rsi
0xffffffff810af2f3 <pick_next_task_fair+115>:   callq  0xffffffff810a7d20 <set_next_entity>
/build/buildd/linux-lts-utopic-3.16.0/kernel/sched/fair.c: 4747
0xffffffff810af2f8 <pick_next_task_fair+120>:   mov    0x158(%rbx),%r12


crash> cfs_rq ffff883ffedf3140
struct cfs_rq {
  load = {
    weight = 0, 
    inv_weight = 0
  }, 
  nr_running = 0, 
  h_nr_running = 0, 
  exec_clock = 50727270151190, 
  min_vruntime = 1126033353437518, 
  tasks_timeline = {
    rb_node = 0x0
  }, 
  rb_leftmost = 0x0, 
  curr = 0x0, 
  next = 0x0, 
  last = 0x0, 
  skip = 0x0, 
  nr_spread_over = 960, 
  runnable_load_avg = 0, 
  blocked_load_avg = 108, 
  decay_counter = {
    counter = 139289244
  }, 
  last_decay = 139289243, 
  removed_load = {
    counter = 0
  }, 
  tg_runnable_contrib = 329, 
  tg_load_contrib = 108, 
  h_load = 75, 
  last_h_load_update = 4331424052, 
  h_load_next = 0xffff883fcab5d600, 
  rq = 0xffff883ffedf30c0, 
  on_list = 1, 
  leaf_cfs_rq_list = {
    next = 0xffff883eec13eec0, 
    prev = 0xffff883fcab5fec0
  }, 
  tg = 0xffffffff81ebdd80 <root_task_group>, 
  runtime_enabled = 0, 
  runtime_expires = 0, 
  runtime_remaining = 0, 
  throttled_clock = 0, 
  throttled_clock_task = 0, 
  throttled_clock_task_time = 0, 
  throttled = 0, 
  throttle_count = 0, 
  throttled_list = {
    next = 0xffff883ffedf3250, 
    prev = 0xffff883ffedf3250
  }
}
crash>

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1458045

Title:
  KVM and CFS bandwidth control causes kernel crashes (oops)

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  We've seen this crash at least 3 times when we start setting CPU
  limits using `cgroups`.  It makes using CPU limits impossible, causing
  instabilities in the operating system.  Finally, after installing
  linux-crashdump, we got a full copy of the crash message.

  ========================================================
  [146055.357476] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
  [146055.359620] IP: [<ffffffff810a7d31>] set_next_entity+0x11/0xb0
  [146055.361890] PGD 0 
  [146055.364131] Oops: 0000 [#1] SMP 
  [146055.366475] Modules linked in: vhost_net vhost macvtap macvlan act_police cls_u32 sch_ingress ipmi_si xt_multiport nf_conntrack_ipv6 nf_defrag_ipv6 xt_mac xt_physdev xt_set iptable_raw ip_set_hash_ip ip_set nfnetlink mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp dell_rbu bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables nbd openvswitch gre vxlan libcrc32c ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel dcdbas kvm crct10dif_pclmul crc32_pclmul
  [146055.388889]  ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul dm_multipath glue_helper ablk_helper scsi_dh cryptd mei_me mei lpc_ich ipmi_msghandler shpchp wmi acpi_power_meter mac_hid lp parport nls_iso8859_1 igb ixgbe i2c_algo_bit dca ptp ahci pps_core megaraid_sas libahci mdio [last unloaded: ipmi_si]
  [146055.404208] CPU: 31 PID: 67922 Comm: qemu-system-x86 Not tainted 3.16.0-37-generic #51~14.04.1-Ubuntu
  [146055.409906] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 1.0.4 08/28/2014
  [146055.415754] task: ffff883fcab69e90 ti: ffff883a1c168000 task.ti: ffff883a1c168000
  [146055.421817] RIP: 0010:[<ffffffff810a7d31>]  [<ffffffff810a7d31>] set_next_entity+0x11/0xb0
  [146055.428079] RSP: 0018:ffff883a1c16bce8  EFLAGS: 00010092
  [146055.434377] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000044aa200
  [146055.440913] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff883ffedf3140
  [146055.447474] RBP: ffff883a1c16bd00 R08: 0000000000000000 R09: 0000000000000001
  [146055.454181] R10: 0000000000000004 R11: 0000000000000206 R12: ffff883ffedf3140
  [146055.460968] R13: 000000000000001f R14: 0000000000000001 R15: ffff883ffedf30c0
  [146055.467722] FS:  00007f404919d700(0000) GS:ffff883ffede0000(0000) knlGS:ffff880002380000
  [146055.474756] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [146055.481830] CR2: 0000000000000038 CR3: 0000003a1c45b000 CR4: 00000000001427e0
  [146055.489134] Stack:
  [146055.496412]  0000000000000000 ffff883ffedf3140 000000000000001f ffff883a1c16bd68
  [146055.504053]  ffffffff810af2f8 ffff883ffedf3140 00000000000130c0 ffff883fcab69e90
  [146055.511786]  ffffffff8101c3b9 ffff883a1c16bd50 ffffffff810a4895 ffff883fcab6a3c8
  [146055.519551] Call Trace:
  [146055.527330]  [<ffffffff810af2f8>] pick_next_task_fair+0x78/0x880
  [146055.535292]  [<ffffffff8101c3b9>] ? sched_clock+0x9/0x10
  [146055.543379]  [<ffffffff810a4895>] ? sched_clock_cpu+0x85/0xc0
  [146055.551519]  [<ffffffff81768afb>] __schedule+0x11b/0x7a0
  [146055.559722]  [<ffffffff81769579>] _cond_resched+0x29/0x40
  [146055.568020]  [<ffffffffc0338289>] kvm_arch_vcpu_ioctl_run+0x3e9/0x460 [kvm]
  [146055.576509]  [<ffffffffc0321ce2>] kvm_vcpu_ioctl+0x2a2/0x5e0 [kvm]
  [146055.585045]  [<ffffffff81156952>] ? perf_event_context_sched_in+0xa2/0xc0
  [146055.593771]  [<ffffffff811e7250>] do_vfs_ioctl+0x2e0/0x4c0
  [146055.602531]  [<ffffffff8109dec8>] ? finish_task_switch+0x108/0x180
  [146055.611413]  [<ffffffffc032bcd4>] ? kvm_on_user_return+0x74/0x80 [kvm]
  [146055.620339]  [<ffffffff811e74b1>] SyS_ioctl+0x81/0xa0
  [146055.629396]  [<ffffffff8176d20d>] system_call_fastpath+0x1a/0x1f
  [146055.638500] Code: 83 c4 10 4c 89 f2 4c 89 ee ff d0 49 8b 04 24 48 85 c0 75 e6 eb 99 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 49 89 fc 53 <8b> 46 38 48 89 f3 85 c0 75 5d 49 8b 84 24 b0 00 00 00 48 8b 80 
  [146055.657833] RIP  [<ffffffff810a7d31>] set_next_entity+0x11/0xb0
  [146055.667524]  RSP <ffff883a1c16bce8>
  [146055.677082] CR2: 0000000000000038
  ========================================================

  I've found the following "potential" fix that doesn't seem to have
  every made it through: https://lkml.org/lkml/2015/4/7/611

  In addition, I have a 12GB dump file generated by linux-crashdump,
  please let me know if there's anything I can do with it which can help
  troubleshoot this issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1458045/+subscriptions


References