kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #171790
[Bug 1568729] Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault
And again. This time with upstream kernel (linux-image-4.5.1-040501-generic):
[Fri Apr 15 13:26:56 2016] divide error: 0000 [#1] SMP
[Fri Apr 15 13:26:56 2016] Modules linked in: vhost_net vhost macvtap macvlan ip6table_mangle nfnetlink_queue nfnetlink xt_CLASSIFY xt_CHECKSUM xt_nat iptable_nat nf_nat_ipv4 xt_NFQUEUE sch_sfq sch_htb veth dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag ebtable_filter ebtables nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_CT iptable_raw xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter iptable_mangle ip_tables x_tables dummy bridge stp llc ipmi_ssif ipmi_devintf x86_pkg_temp_thermal intel_powerclamp coretemp dcdbas kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac input_leds joydev edac_core nf_nat_ftp cdc_ether usbnet nf_conntrack_ftp mii nf_nat_pptp lpc_ich nf_conntrack_pptp nf_nat_proto_gre ioatdma nf_nat_sip nf_nat nfsd ipmi_si nf_conntrack_sip ipmi_msghandler 8250_fintek nf_conntrack_proto_gre acpi_pad nf_conntrack wmi acpi_power_meter shpchp mac_hid auth_rpcgss nfs_acl bonding nfs lp lockd parport grace sunrpc fscache tcp_htcp xfs btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid igb raid6_pq ixgbe libcrc32c raid1 i2c_algo_bit vxlan ahci ip6_udp_tunnel dca raid0 libahci udp_tunnel ptp megaraid_sas pps_core multipath mdio fjes linear
[Fri Apr 15 13:26:56 2016] CPU: 10 PID: 9261 Comm: ceph-osd Not tainted 4.5.1-040501-generic #201604121331
[Fri Apr 15 13:26:56 2016] Hardware name: Dell Inc. PowerEdge R720/08RW36, BIOS 2.5.2 01/28/2015
[Fri Apr 15 13:26:56 2016] task: ffff8846b4611c80 ti: ffff8846b4704000 task.ti: ffff8846b4704000
[Fri Apr 15 13:26:56 2016] RIP: 0010:[<ffffffff810b5d3c>] [<ffffffff810b5d3c>] task_numa_find_cpu+0x23c/0x710
[Fri Apr 15 13:26:56 2016] RSP: 0000:ffff8846b4707bd8 EFLAGS: 00010206
[Fri Apr 15 13:26:56 2016] RAX: 0000000000000000 RBX: ffff8846b4707c78 RCX: 0000000000000000
[Fri Apr 15 13:26:56 2016] RDX: 0000000000000000 RSI: ffff88239f940000 RDI: ffff88237a510200
[Fri Apr 15 13:26:56 2016] RBP: ffff8846b4707c40 R08: 0000000101378ff0 R09: 0000000000000012
[Fri Apr 15 13:26:56 2016] R10: 00000000000000ee R11: 0000000000000003 R12: ffff8846b46b0e40
[Fri Apr 15 13:26:56 2016] R13: 0000000000000001 R14: 0000000000000000 R15: 00000000000000e0
[Fri Apr 15 13:26:56 2016] FS: 00007f95c52c8700(0000) GS:ffff88239f940000(0000) knlGS:0000000000000000
[Fri Apr 15 13:26:56 2016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Apr 15 13:26:56 2016] CR2: 000000002b75cc00 CR3: 00000047850e4000 CR4: 00000000001426e0
[Fri Apr 15 13:26:56 2016] Stack:
[Fri Apr 15 13:26:56 2016] ffff8846b4707c38 ffffffff8101667e 0000000100016b00 ffff8846b4611c80
[Fri Apr 15 13:26:56 2016] 000000000000014d fffffffffffffe53 0000000000016b00 000000000000014d
[Fri Apr 15 13:26:56 2016] ffff8846b4611c80 ffff8846b4707c78 00000000000002ca 000000000000014d
[Fri Apr 15 13:26:56 2016] Call Trace:
[Fri Apr 15 13:26:56 2016] [<ffffffff8101667e>] ? __switch_to+0x1de/0x5d0
[Fri Apr 15 13:26:56 2016] [<ffffffff810b664e>] task_numa_migrate+0x43e/0x9b0
[Fri Apr 15 13:26:56 2016] [<ffffffff810b6c39>] numa_migrate_preferred+0x79/0x80
[Fri Apr 15 13:26:56 2016] [<ffffffff810bb2d7>] task_numa_fault+0x7f7/0xd40
[Fri Apr 15 13:26:56 2016] [<ffffffff810ba945>] ? should_numa_migrate_memory+0x55/0x130
[Fri Apr 15 13:26:56 2016] [<ffffffff811c2570>] handle_mm_fault+0x1160/0x1ad0
[Fri Apr 15 13:26:56 2016] [<ffffffff816fd8a4>] ? SYSC_recvfrom+0x144/0x160
[Fri Apr 15 13:26:56 2016] [<ffffffff8106aa67>] __do_page_fault+0x197/0x400
[Fri Apr 15 13:26:56 2016] [<ffffffff8106acf2>] do_page_fault+0x22/0x30
[Fri Apr 15 13:26:56 2016] [<ffffffff818270b8>] page_fault+0x28/0x30
[Fri Apr 15 13:26:56 2016] Code: 55 b0 4c 89 f7 e8 55 c9 ff ff 48 8b 55 b0 49 8b 4e 78 48 8b 82 18 02 00 00 48 83 c1 01 31 d2 49 0f af 86 b0 00 00 00 4c 8b 73 78 <48> f7 f1 48 8b 4b 20 49 89 c0 48 29 c1 48 8b 45 d0 4c 03 43 48
[Fri Apr 15 13:26:56 2016] RIP [<ffffffff810b5d3c>] task_numa_find_cpu+0x23c/0x710
[Fri Apr 15 13:26:56 2016] RSP <ffff8846b4707bd8>
[Fri Apr 15 13:26:56 2016] ---[ end trace ce23f377286f87a4 ]---
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1568729
Title:
divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault
Status in linux package in Ubuntu:
In Progress
Status in linux source package in Xenial:
In Progress
Bug description:
While running qemu 2.5 on a trusty host running 4.4.0-15.31~14.04.1
the host system has crashed (load > 200) 3 times in the last 3 days.
Always with this stack trace:
Apr 9 19:01:09 cnode9.0 kernel: [197071.195577] divide error: 0000 [#1] SMP
Apr 9 19:01:09 cnode9.0 kernel: [197071.195633] Modules linked in: vhost_net vhost macvtap macvlan arc4 md4 nls_utf8 ci
fs nfnetlink_queue nfnetlink xt_CHECKSUM xt_nat iptable_nat nf_nat_ipv4 xt_NFQUEUE xt_CLASSIFY ip6table_mangle sch_sfq sch_htb veth dccp_diag
dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag ebtable_filter ebtables nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_fil
ter ip6_tables iptable_mangle xt_CT iptable_raw xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter ip_tables x_tables dum
my bridge stp llc ipmi_ssif ipmi_devintf intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm dcdbas irqbypass crct10dif_p
clmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev input_leds nf_nat_ftp sb_edac nf_conntrack_ftp e
dac_core cdc_ether nf_nat_pptp usbnet nf_conntrack_pptp mii nf_nat_proto_gre lpc_ich nf_nat_sip ioatdma nf_nat nf_conntrack_sip nfsd ipmi_si
8250_fintek nf_conntrack_proto_gre ipmi_msghandler acpi_pad wmi shpchp nf_conntrack acpi_power_meter mac_hid auth_rpcgss nfs_acl bonding nfs
lp lockd parport grace sunrpc fscache tcp_htcp xfs btrfs hid_generic usbhid hid raid10 raid456 async_raid6_recov async_memcpy async_pq async_
xor async_tx xor ixgbe raid6_pq libcrc32c igb vxlan raid1 i2c_algo_bit ip6_udp_tunnel dca udp_tunnel ahci raid0 ptp libahci megaraid_sas mult
ipath pps_core mdio linear fjes
Apr 9 19:01:09 cnode9.0 kernel: [197071.197014] CPU: 13 PID: 3147726 Comm: ceph-osd Not tainted 4.4.0-15-generic #31~14
.04.1-Ubuntu
Apr 9 19:01:09 cnode9.0 kernel: [197071.197085] Hardware name: Dell Inc. PowerEdge R720/0XH7F2, BIOS 2.5.2 01/28/2015
Apr 9 19:01:09 cnode9.0 kernel: [197071.197154] task: ffff88252be1ee00 ti: ffff8824fc0d4000 task.ti: ffff8824fc0d4000
Apr 9 19:01:09 cnode9.0 kernel: [197071.197221] RIP: 0010:[<ffffffff810afec8>] [<ffffffff810afec8>] task_numa_find_cpu+0x238/0x700
Apr 9 19:01:09 cnode9.0 kernel: [197071.197300] RSP: 0000:ffff8824fc0d7ba8 EFLAGS: 00010257
Apr 9 19:01:09 cnode9.0 kernel: [197071.197340] RAX: 0000000000000000 RBX: ffff8824fc0d7c48 RCX: 0000000000000000
Apr 9 19:01:09 cnode9.0 kernel: [197071.197406] RDX: 0000000000000000 RSI: ffff88479f180000 RDI: ffff884782a47600
Apr 9 19:01:09 cnode9.0 kernel: [197071.197473] RBP: ffff8824fc0d7c10 R08: 0000000102eea157 R09: 00000000000001a8
Apr 9 19:01:09 cnode9.0 kernel: [197071.197540] R10: 000000000002404b R11: 000000000000023f R12: ffff882380930000
Apr 9 19:01:09 cnode9.0 kernel: [197071.197606] R13: 0000000000000008 R14: 000000000000008c R15: 0000000000000124
Apr 9 19:01:09 cnode9.0 kernel: [197071.197673] FS: 00007f19aab5b700(0000) GS:ffff88479f180000(0000) knlGS:0000000000000000
Apr 9 19:01:09 cnode9.0 kernel: [197071.197741] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 9 19:01:09 cnode9.0 kernel: [197071.197782] CR2: 0000000025469600 CR3: 00000023846bc000 CR4: 00000000000426e0
Apr 9 19:01:09 cnode9.0 kernel: [197071.197848] Stack:
Apr 9 19:01:09 cnode9.0 kernel: [197071.197880] ffffffff817425fb ffff8829af3e9e00 00000000000000f6 ffff88252be1ee00
Apr 9 19:01:09 cnode9.0 kernel: [197071.197965] 000000000000008d 0000000000000225 0000000000016d40 000000000000008d
Apr 9 19:01:09 cnode9.0 kernel: [197071.198047] ffff88252be1ee00 00000000000001ad ffff8824fc0d7c48 00000000000000e1
Apr 9 19:01:09 cnode9.0 kernel: [197071.198132] Call Trace:
Apr 9 19:01:09 cnode9.0 kernel: [197071.198172] [<ffffffff817425fb>] ? tcp_schedule_loss_probe+0x12b/0x1b0
Apr 9 19:01:09 cnode9.0 kernel: [197071.198219] [<ffffffff810b0830>] task_numa_migrate+0x4a0/0x930
Apr 9 19:01:09 cnode9.0 kernel: [197071.198264] [<ffffffff816d2957>] ? release_sock+0x117/0x160
Apr 9 19:01:09 cnode9.0 kernel: [197071.198306] [<ffffffff810b0d39>] numa_migrate_preferred+0x79/0x80
Apr 9 19:01:09 cnode9.0 kernel: [197071.198350] [<ffffffff810b557d>] task_numa_fault+0x91d/0xcc0
Apr 9 19:01:09 cnode9.0 kernel: [197071.198395] [<ffffffff811d35ae>] ? mpol_misplaced+0x14e/0x190
Apr 9 19:01:09 cnode9.0 kernel: [197071.198439] [<ffffffff811b06b8>] handle_pte_fault+0x5a8/0x14c0
Apr 9 19:01:09 cnode9.0 kernel: [197071.198485] [<ffffffff810f8531>] ? futex_wake+0x81/0x150
Apr 9 19:01:09 cnode9.0 kernel: [197071.198526] [<ffffffff810b0de4>] ? set_next_entity+0xa4/0x700
Apr 9 19:01:09 cnode9.0 kernel: [197071.198569] [<ffffffff810fab44>] ? do_futex+0xf4/0x4d0
Apr 9 19:01:09 cnode9.0 kernel: [197071.198610] [<ffffffff811b2440>] handle_mm_fault+0x250/0x540
Apr 9 19:01:09 cnode9.0 kernel: [197071.198654] [<ffffffff81067d19>] __do_page_fault+0x199/0x430
Apr 9 19:01:09 cnode9.0 kernel: [197071.198696] [<ffffffff81067fd2>] do_page_fault+0x22/0x30
Apr 9 19:01:09 cnode9.0 kernel: [197071.198740] [<ffffffff817ef878>] page_fault+0x28/0x30
Apr 9 19:01:09 cnode9.0 kernel: [197071.198775] Code: 4d b0 4c 89 f7 e8 29 d5 ff ff 48 8b 4d b0 49 8b 86 b0 00 00 00 31 d2 48 0f af 81 d8 01 00 00 49 8b 4e 78 4c 8b 73 78 48 83 c1 01 <48> f7 f1 48 8b 4b 20 49 89 c1 48 29 c1 4c 03 4b 48 4c 39 7d d0
Apr 9 19:01:09 cnode9.0 kernel: [197071.199217] RIP [<ffffffff810afec8>] task_numa_find_cpu+0x238/0x700
Apr 9 19:01:09 cnode9.0 kernel: [197071.199264] RSP <ffff8824fc0d7ba8>
Apr 9 19:01:09 cnode9.0 kernel: [197071.199900] ---[ end trace e938a840610a79f7 ]---
This is appears to be the same bug as reported upstream in
http://lkml.iu.edu/hypermail/linux/kernel/1603.2/01659.html
According to this thread the issue is:
27: 48 83 c1 01 add $0x1,%rcx
2b:* 48 f7 f1 div %rcx <-- trapping instruction
This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:
update_cfs_rq_h_load(cfs_rq);
return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
cfs_rq_load_avg(cfs_rq) + 1);
So the load avg is -1, thus after adding 1 we get division by 0
The fix of the LKML reporter was to include the patches to kernel/sched/fair.c up to 4.5
A specific patch was not identified.
Please backport these patches for Xenial and lts-xenial kernel in
trusty.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1568729/+subscriptions
References