kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #150198
[Bug 1484919] Re: Kernel oops associated with BIRD/netlink
The Ubuntu 3.19.0-35.40 kernel currently has the 3.19.8-ckt8 stable
updates, so an quick test would be to install Vivid, then apply the
latest updates and check if the bug exists. However, that will not tell
us if it's already fixed in ckt10.
We know this bug is fixed in the Wily(4.2) kernel. However, we do not
yet know the exact commit that fixes the bug, or if the fix was also
sent to upstream stable. Testing the latest upstream 3.19 kernel will
tell us if we need to perform a lengthy reverse bisect or not. If the
bug is fixed in the latest 3.19 upstream kernel, it means the fix is
already on it's way into Vivid through the normal stable update process.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1484919
Title:
Kernel oops associated with BIRD/netlink
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Vivid:
In Progress
Status in linux source package in Wily:
Fix Released
Status in linux source package in Xenial:
Fix Released
Bug description:
Scale testing our product, which uses the BIRD BGP daemon, on Google's
GCE cloud, we see frequent (40% of hosts) Kernel Oopses and reboots on
kernel 3.19.0-25-generic #26~14.04.1-Ubuntu with BIRD running. This
is the standard GCE-provided Ubuntu image.
If we replace the image with a stock Ubuntu one (kernel
3.13.0-61-generic #100-Ubuntu), installed from ISO, then we do not see
the issue.
If we stop BIRD then we no longer see the issue.
I suspect that this is an issue with the way BIRD is using netlink,
triggering a kernel bug. It seems to happen more at scale, when BIRD
is doing more with netlink and we have thousands of routes in place.
Here's a sample kernel oops:
[ 266.033276] BUG: unable to handle kernel paging request at 000000190000003c
[ 266.035142] IP: [<ffffffff811d1f0b>] __kmalloc_node_track_caller+0xfb/0x2c0
[ 266.036009] PGD b9e5e067 PUD 0
[ 266.036009] Oops: 0000 [#1] SMP
[ 266.036009] Modules linked in: bridge stp llc dummy xt_mac xt_mark nf_conntrack_ipv6 nf_defrag_ipv6 xt_comment xt_set ip_set_hash_ip ip_set nfnetlink ebtable_nat ebtables xt_nat ipip tunnel4 ip_tunnel ipt_REJECT nf_reject_ipv4 xt_conntrack xt_CHECKSUM xt_tcpudp iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables x_tables nbd ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_crypt ppdev dm_multipath scsi_dh 8250_fintek parport_pc i2c_piix4 mac_hid serio_raw parport crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
[ 266.036009] CPU: 2 PID: 3456 Comm: bird Tainted: G C 3.19.0-25-generic #26~14.04.1-Ubuntu
[ 266.036009] Hardware name: Google Google, BIOS Google 01/01/2011
[ 266.036009] task: ffff8801210775c0 ti: ffff880036a08000 task.ti: ffff880036a08000
[ 266.036009] RIP: 0010:[<ffffffff811d1f0b>] [<ffffffff811d1f0b>] __kmalloc_node_track_caller+0xfb/0x2c0
[ 266.036009] RSP: 0018:ffff880036a0b7f8 EFLAGS: 00010246
[ 266.036009] RAX: 0000000000000000 RBX: 00000000000102d0 RCX: 000000000008b0f6
[ 266.036009] RDX: 000000000008b0f5 RSI: 0000000000000000 RDI: 00000000000171c0
[ 266.036009] RBP: ffff880036a0b848 R08: ffff8801263171c0 R09: ffff880121c01600
[ 266.036009] R10: 0000000000000000 R11: ffff880121c01600 R12: 00000000000102d0
[ 266.036009] R13: 0000000000000180 R14: 00000000ffffffff R15: 000000190000003c
[ 266.036009] FS: 00007fd470753740(0000) GS:ffff880126300000(0000) knlGS:0000000000000000
[ 266.036009] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 266.036009] CR2: 000000190000003c CR3: 00000000bb207000 CR4: 00000000001406e0
[ 266.036009] Stack:
[ 266.036009] 0000000100000180 00000000000000c3 ffff880121c01600 ffffffff816992ea
[ 266.036009] 0000000000000001 ffff8800b1486a00 0000000000000000 00000000000000d0
[ 266.036009] 0000000000000180 00000000ffffffff ffff880036a0b898 ffffffff81697261
[ 266.036009] Call Trace:
[ 266.036009] [<ffffffff816992ea>] ? pskb_expand_head+0x6a/0x260
[ 266.036009] [<ffffffff81697261>] __kmalloc_reserve.isra.27+0x31/0x90
[ 266.036009] [<ffffffff816992ea>] pskb_expand_head+0x6a/0x260
[ 266.036009] [<ffffffff816d6d13>] netlink_trim+0xa3/0xe0
[ 266.036009] [<ffffffff816d984e>] netlink_unicast+0x3e/0x200
[ 266.036009] [<ffffffff816da323>] nlmsg_notify+0x93/0xb0
[ 266.036009] [<ffffffff816b8d3e>] rtnl_notify+0x2e/0x40
[ 266.036009] [<ffffffff81727525>] rtmsg_fib+0x115/0x160
[ 266.036009] [<ffffffff8172a09d>] ? trie_rebalance+0x10d/0x130
[ 266.036009] [<ffffffff8172a34a>] fib_table_insert+0x1da/0x8e0
[ 266.036009] [<ffffffff817242a8>] inet_rtm_newroute+0x48/0x60
[ 266.036009] [<ffffffff816b97c5>] rtnetlink_rcv_msg+0x95/0x250
[ 266.036009] [<ffffffff813bb4a6>] ? rhashtable_lookup_compare+0x36/0x70
[ 266.036009] [<ffffffff816d631e>] ? __netlink_lookup+0x3e/0x50
[ 266.036009] [<ffffffff816b9730>] ? rtnetlink_rcv+0x40/0x40
[ 266.036009] [<ffffffff816da271>] netlink_rcv_skb+0xc1/0xe0
[ 266.036009] [<ffffffff816b971c>] rtnetlink_rcv+0x2c/0x40
[ 266.036009] [<ffffffff816d9906>] netlink_unicast+0xf6/0x200
[ 266.036009] [<ffffffff816d9d1c>] netlink_sendmsg+0x30c/0x680
[ 266.036009] [<ffffffff81351610>] ? aa_sk_perm.isra.4+0x70/0x150
[ 266.036009] [<ffffffff8168f2ec>] do_sock_sendmsg+0x8c/0x100
[ 266.036009] [<ffffffff81209a13>] ? __fdget+0x13/0x20
[ 266.036009] [<ffffffff8168f547>] SYSC_sendto+0x157/0x200
[ 266.036009] [<ffffffff81690252>] ? __sys_recvmsg+0x42/0x80
[ 266.036009] [<ffffffff8168fd2e>] SyS_sendto+0xe/0x10
[ 266.036009] [<ffffffff817b668d>] system_call_fastpath+0x16/0x1b
[ 266.036009] Code: fb 41 8b 53 18 0f 1f 44 00 00 48 83 c4 28 48 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 40 00 49 63 41 20 48 8d 4a 01 49 8b 39 <49> 8b 1c 07 4c 89 f8 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 53 ff
[ 266.036009] RIP [<ffffffff811d1f0b>] __kmalloc_node_track_caller+0xfb/0x2c0
[ 266.036009] RSP <ffff880036a0b7f8>
[ 266.036009] CR2: 000000190000003c
[ 266.131166] ---[ end trace 246ae06038901786 ]---
Running our product on CoreOS, we see similar, but less frequent
crashes. Their kernel is 4.1-based:
https://github.com/coreos/bugs/issues/435
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1484919/+subscriptions
References