kernel-packages team mailing list archive

Thread
Date
[Bug 1568729] Missing required logs.

To: kernel-packages@xxxxxxxxxxxxxxxxxxx
From: Brad Figg <brad.figg@xxxxxxxxxxxxx>
Date: Mon, 11 Apr 2016 08:00:08 -0000
Reply-to: Bug 1568729 <1568729@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
This bug is missing log files that will aid in diagnosing the problem.
>From a terminal window please run:

apport-collect 1568729

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable
to run this command, please add a comment stating that fact and change
the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the
Ubuntu Kernel Team.

** Changed in: linux (Ubuntu)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1568729

Title:
  divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  While running qemu 2.5 on a trusty host running 4.4.0-15.31~14.04.1
  the host system has crashed (load > 200) 3 times in the last 3 days.

  Always with this stack trace:

  Apr  9 19:01:09 cnode9.0 kernel: [197071.195577] divide error: 0000 [#1] SMP 
  Apr  9 19:01:09 cnode9.0 kernel: [197071.195633] Modules linked in: vhost_net vhost macvtap macvlan arc4 md4 nls_utf8 ci
  fs nfnetlink_queue nfnetlink xt_CHECKSUM xt_nat iptable_nat nf_nat_ipv4 xt_NFQUEUE xt_CLASSIFY ip6table_mangle sch_sfq sch_htb veth dccp_diag
   dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag ebtable_filter ebtables nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_fil
  ter ip6_tables iptable_mangle xt_CT iptable_raw xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter ip_tables x_tables dum
  my bridge stp llc ipmi_ssif ipmi_devintf intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm dcdbas irqbypass crct10dif_p
  clmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev input_leds nf_nat_ftp sb_edac nf_conntrack_ftp e
  dac_core cdc_ether nf_nat_pptp usbnet nf_conntrack_pptp mii nf_nat_proto_gre lpc_ich nf_nat_sip ioatdma nf_nat nf_conntrack_sip nfsd ipmi_si 
  8250_fintek nf_conntrack_proto_gre ipmi_msghandler acpi_pad wmi shpchp nf_conntrack acpi_power_meter mac_hid auth_rpcgss nfs_acl bonding nfs 
  lp lockd parport grace sunrpc fscache tcp_htcp xfs btrfs hid_generic usbhid hid raid10 raid456 async_raid6_recov async_memcpy async_pq async_
  xor async_tx xor ixgbe raid6_pq libcrc32c igb vxlan raid1 i2c_algo_bit ip6_udp_tunnel dca udp_tunnel ahci raid0 ptp libahci megaraid_sas mult
  ipath pps_core mdio linear fjes
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197014] CPU: 13 PID: 3147726 Comm: ceph-osd Not tainted 4.4.0-15-generic #31~14
  .04.1-Ubuntu
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197085] Hardware name: Dell Inc. PowerEdge R720/0XH7F2, BIOS 2.5.2 01/28/2015
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197154] task: ffff88252be1ee00 ti: ffff8824fc0d4000 task.ti: ffff8824fc0d4000
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197221] RIP: 0010:[<ffffffff810afec8>]  [<ffffffff810afec8>] task_numa_find_cpu+0x238/0x700
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197300] RSP: 0000:ffff8824fc0d7ba8  EFLAGS: 00010257
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197340] RAX: 0000000000000000 RBX: ffff8824fc0d7c48 RCX: 0000000000000000
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197406] RDX: 0000000000000000 RSI: ffff88479f180000 RDI: ffff884782a47600
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197473] RBP: ffff8824fc0d7c10 R08: 0000000102eea157 R09: 00000000000001a8
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197540] R10: 000000000002404b R11: 000000000000023f R12: ffff882380930000
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197606] R13: 0000000000000008 R14: 000000000000008c R15: 0000000000000124
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197673] FS:  00007f19aab5b700(0000) GS:ffff88479f180000(0000) knlGS:0000000000000000
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197782] CR2: 0000000025469600 CR3: 00000023846bc000 CR4: 00000000000426e0
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197848] Stack:
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197880]  ffffffff817425fb ffff8829af3e9e00 00000000000000f6 ffff88252be1ee00
  Apr  9 19:01:09 cnode9.0 kernel: [197071.197965]  000000000000008d 0000000000000225 0000000000016d40 000000000000008d
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198047]  ffff88252be1ee00 00000000000001ad ffff8824fc0d7c48 00000000000000e1
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198132] Call Trace:
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198172]  [<ffffffff817425fb>] ? tcp_schedule_loss_probe+0x12b/0x1b0
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198219]  [<ffffffff810b0830>] task_numa_migrate+0x4a0/0x930
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198264]  [<ffffffff816d2957>] ? release_sock+0x117/0x160
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198306]  [<ffffffff810b0d39>] numa_migrate_preferred+0x79/0x80
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198350]  [<ffffffff810b557d>] task_numa_fault+0x91d/0xcc0
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198395]  [<ffffffff811d35ae>] ? mpol_misplaced+0x14e/0x190
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198439]  [<ffffffff811b06b8>] handle_pte_fault+0x5a8/0x14c0
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198485]  [<ffffffff810f8531>] ? futex_wake+0x81/0x150
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198526]  [<ffffffff810b0de4>] ? set_next_entity+0xa4/0x700
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198569]  [<ffffffff810fab44>] ? do_futex+0xf4/0x4d0
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198610]  [<ffffffff811b2440>] handle_mm_fault+0x250/0x540
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198654]  [<ffffffff81067d19>] __do_page_fault+0x199/0x430
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198696]  [<ffffffff81067fd2>] do_page_fault+0x22/0x30
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198740]  [<ffffffff817ef878>] page_fault+0x28/0x30
  Apr  9 19:01:09 cnode9.0 kernel: [197071.198775] Code: 4d b0 4c 89 f7 e8 29 d5 ff ff 48 8b 4d b0 49 8b 86 b0 00 00 00 31 d2 48 0f af 81 d8 01 00 00 49 8b 4e 78 4c 8b 73 78 48 83 c1 01 <48> f7 f1 48 8b 4b 20 49 89 c1 48 29 c1 4c 03 4b 48 4c 39 7d d0 
  Apr  9 19:01:09 cnode9.0 kernel: [197071.199217] RIP  [<ffffffff810afec8>] task_numa_find_cpu+0x238/0x700
  Apr  9 19:01:09 cnode9.0 kernel: [197071.199264]  RSP <ffff8824fc0d7ba8>
  Apr  9 19:01:09 cnode9.0 kernel: [197071.199900] ---[ end trace e938a840610a79f7 ]---

  This is appears to be the same bug as reported upstream in 
  http://lkml.iu.edu/hypermail/linux/kernel/1603.2/01659.html

  According to this thread the issue is:

  27: 48 83 c1 01 add $0x1,%rcx
  2b:* 48 f7 f1 div %rcx <-- trapping instruction

  This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:

  update_cfs_rq_h_load(cfs_rq);
  return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
  cfs_rq_load_avg(cfs_rq) + 1);

  So the load avg is -1, thus after adding 1 we get division by 0

  The fix of the LKML reporter was to include the patches to kernel/sched/fair.c up to 4.5
  A specific patch was not identified.

  Please backport these patches for Xenial and lts-xenial kernel in
  trusty.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1568729/+subscriptions
References

[Bug 1568729] [NEW] divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault
From: Markus Schade, 2016-04-11