kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #171339
[Bug 1568729] Missing required logs.
This bug is missing log files that will aid in diagnosing the problem.
>From a terminal window please run:
apport-collect 1568729
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable
to run this command, please add a comment stating that fact and change
the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the
Ubuntu Kernel Team.
** Changed in: linux (Ubuntu)
Status: New => Incomplete
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1568729
Title:
divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault
Status in linux package in Ubuntu:
Confirmed
Bug description:
While running qemu 2.5 on a trusty host running 4.4.0-15.31~14.04.1
the host system has crashed (load > 200) 3 times in the last 3 days.
Always with this stack trace:
Apr 9 19:01:09 cnode9.0 kernel: [197071.195577] divide error: 0000 [#1] SMP
Apr 9 19:01:09 cnode9.0 kernel: [197071.195633] Modules linked in: vhost_net vhost macvtap macvlan arc4 md4 nls_utf8 ci
fs nfnetlink_queue nfnetlink xt_CHECKSUM xt_nat iptable_nat nf_nat_ipv4 xt_NFQUEUE xt_CLASSIFY ip6table_mangle sch_sfq sch_htb veth dccp_diag
dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag ebtable_filter ebtables nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_fil
ter ip6_tables iptable_mangle xt_CT iptable_raw xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter ip_tables x_tables dum
my bridge stp llc ipmi_ssif ipmi_devintf intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm dcdbas irqbypass crct10dif_p
clmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev input_leds nf_nat_ftp sb_edac nf_conntrack_ftp e
dac_core cdc_ether nf_nat_pptp usbnet nf_conntrack_pptp mii nf_nat_proto_gre lpc_ich nf_nat_sip ioatdma nf_nat nf_conntrack_sip nfsd ipmi_si
8250_fintek nf_conntrack_proto_gre ipmi_msghandler acpi_pad wmi shpchp nf_conntrack acpi_power_meter mac_hid auth_rpcgss nfs_acl bonding nfs
lp lockd parport grace sunrpc fscache tcp_htcp xfs btrfs hid_generic usbhid hid raid10 raid456 async_raid6_recov async_memcpy async_pq async_
xor async_tx xor ixgbe raid6_pq libcrc32c igb vxlan raid1 i2c_algo_bit ip6_udp_tunnel dca udp_tunnel ahci raid0 ptp libahci megaraid_sas mult
ipath pps_core mdio linear fjes
Apr 9 19:01:09 cnode9.0 kernel: [197071.197014] CPU: 13 PID: 3147726 Comm: ceph-osd Not tainted 4.4.0-15-generic #31~14
.04.1-Ubuntu
Apr 9 19:01:09 cnode9.0 kernel: [197071.197085] Hardware name: Dell Inc. PowerEdge R720/0XH7F2, BIOS 2.5.2 01/28/2015
Apr 9 19:01:09 cnode9.0 kernel: [197071.197154] task: ffff88252be1ee00 ti: ffff8824fc0d4000 task.ti: ffff8824fc0d4000
Apr 9 19:01:09 cnode9.0 kernel: [197071.197221] RIP: 0010:[<ffffffff810afec8>] [<ffffffff810afec8>] task_numa_find_cpu+0x238/0x700
Apr 9 19:01:09 cnode9.0 kernel: [197071.197300] RSP: 0000:ffff8824fc0d7ba8 EFLAGS: 00010257
Apr 9 19:01:09 cnode9.0 kernel: [197071.197340] RAX: 0000000000000000 RBX: ffff8824fc0d7c48 RCX: 0000000000000000
Apr 9 19:01:09 cnode9.0 kernel: [197071.197406] RDX: 0000000000000000 RSI: ffff88479f180000 RDI: ffff884782a47600
Apr 9 19:01:09 cnode9.0 kernel: [197071.197473] RBP: ffff8824fc0d7c10 R08: 0000000102eea157 R09: 00000000000001a8
Apr 9 19:01:09 cnode9.0 kernel: [197071.197540] R10: 000000000002404b R11: 000000000000023f R12: ffff882380930000
Apr 9 19:01:09 cnode9.0 kernel: [197071.197606] R13: 0000000000000008 R14: 000000000000008c R15: 0000000000000124
Apr 9 19:01:09 cnode9.0 kernel: [197071.197673] FS: 00007f19aab5b700(0000) GS:ffff88479f180000(0000) knlGS:0000000000000000
Apr 9 19:01:09 cnode9.0 kernel: [197071.197741] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 9 19:01:09 cnode9.0 kernel: [197071.197782] CR2: 0000000025469600 CR3: 00000023846bc000 CR4: 00000000000426e0
Apr 9 19:01:09 cnode9.0 kernel: [197071.197848] Stack:
Apr 9 19:01:09 cnode9.0 kernel: [197071.197880] ffffffff817425fb ffff8829af3e9e00 00000000000000f6 ffff88252be1ee00
Apr 9 19:01:09 cnode9.0 kernel: [197071.197965] 000000000000008d 0000000000000225 0000000000016d40 000000000000008d
Apr 9 19:01:09 cnode9.0 kernel: [197071.198047] ffff88252be1ee00 00000000000001ad ffff8824fc0d7c48 00000000000000e1
Apr 9 19:01:09 cnode9.0 kernel: [197071.198132] Call Trace:
Apr 9 19:01:09 cnode9.0 kernel: [197071.198172] [<ffffffff817425fb>] ? tcp_schedule_loss_probe+0x12b/0x1b0
Apr 9 19:01:09 cnode9.0 kernel: [197071.198219] [<ffffffff810b0830>] task_numa_migrate+0x4a0/0x930
Apr 9 19:01:09 cnode9.0 kernel: [197071.198264] [<ffffffff816d2957>] ? release_sock+0x117/0x160
Apr 9 19:01:09 cnode9.0 kernel: [197071.198306] [<ffffffff810b0d39>] numa_migrate_preferred+0x79/0x80
Apr 9 19:01:09 cnode9.0 kernel: [197071.198350] [<ffffffff810b557d>] task_numa_fault+0x91d/0xcc0
Apr 9 19:01:09 cnode9.0 kernel: [197071.198395] [<ffffffff811d35ae>] ? mpol_misplaced+0x14e/0x190
Apr 9 19:01:09 cnode9.0 kernel: [197071.198439] [<ffffffff811b06b8>] handle_pte_fault+0x5a8/0x14c0
Apr 9 19:01:09 cnode9.0 kernel: [197071.198485] [<ffffffff810f8531>] ? futex_wake+0x81/0x150
Apr 9 19:01:09 cnode9.0 kernel: [197071.198526] [<ffffffff810b0de4>] ? set_next_entity+0xa4/0x700
Apr 9 19:01:09 cnode9.0 kernel: [197071.198569] [<ffffffff810fab44>] ? do_futex+0xf4/0x4d0
Apr 9 19:01:09 cnode9.0 kernel: [197071.198610] [<ffffffff811b2440>] handle_mm_fault+0x250/0x540
Apr 9 19:01:09 cnode9.0 kernel: [197071.198654] [<ffffffff81067d19>] __do_page_fault+0x199/0x430
Apr 9 19:01:09 cnode9.0 kernel: [197071.198696] [<ffffffff81067fd2>] do_page_fault+0x22/0x30
Apr 9 19:01:09 cnode9.0 kernel: [197071.198740] [<ffffffff817ef878>] page_fault+0x28/0x30
Apr 9 19:01:09 cnode9.0 kernel: [197071.198775] Code: 4d b0 4c 89 f7 e8 29 d5 ff ff 48 8b 4d b0 49 8b 86 b0 00 00 00 31 d2 48 0f af 81 d8 01 00 00 49 8b 4e 78 4c 8b 73 78 48 83 c1 01 <48> f7 f1 48 8b 4b 20 49 89 c1 48 29 c1 4c 03 4b 48 4c 39 7d d0
Apr 9 19:01:09 cnode9.0 kernel: [197071.199217] RIP [<ffffffff810afec8>] task_numa_find_cpu+0x238/0x700
Apr 9 19:01:09 cnode9.0 kernel: [197071.199264] RSP <ffff8824fc0d7ba8>
Apr 9 19:01:09 cnode9.0 kernel: [197071.199900] ---[ end trace e938a840610a79f7 ]---
This is appears to be the same bug as reported upstream in
http://lkml.iu.edu/hypermail/linux/kernel/1603.2/01659.html
According to this thread the issue is:
27: 48 83 c1 01 add $0x1,%rcx
2b:* 48 f7 f1 div %rcx <-- trapping instruction
This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:
update_cfs_rq_h_load(cfs_rq);
return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
cfs_rq_load_avg(cfs_rq) + 1);
So the load avg is -1, thus after adding 1 we get division by 0
The fix of the LKML reporter was to include the patches to kernel/sched/fair.c up to 4.5
A specific patch was not identified.
Please backport these patches for Xenial and lts-xenial kernel in
trusty.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1568729/+subscriptions
References