← Back to team overview

kernel-packages team mailing list archive

[Bug 1484919] Re: Kernel oops associated with BIRD/netlink

 

Hi Lance,

Sorry for the delay.  So it sounds like this bug is a regression from
Trusty(3.13) to Vivid(3.19), but it may now be fixed in Wily since it is
based on the 4.2 kernel.

Can you confirm that the latest Wily kernel does not exhibit this bug?
If it does not, we can perform a "Reverse" bisect to identify the commit
that fixes this, then have it SRU'd to Vivid.

The latest Wily kernel can be downloaded from:
https://launchpad.net/~ubuntu-security/+archive/ubuntu/ppa/+build/8272199

Thanks in advance!

** Changed in: linux (Ubuntu)
     Assignee: (unassigned) => Joseph Salisbury (jsalisbury)

** Changed in: linux (Ubuntu)
       Status: Confirmed => In Progress

** Also affects: linux (Ubuntu Vivid)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Xenial)
   Importance: High
     Assignee: Joseph Salisbury (jsalisbury)
       Status: In Progress

** Also affects: linux (Ubuntu Wily)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Wily)
       Status: New => In Progress

** Changed in: linux (Ubuntu Vivid)
       Status: New => In Progress

** Changed in: linux (Ubuntu Wily)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Vivid)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Wily)
     Assignee: (unassigned) => Joseph Salisbury (jsalisbury)

** Changed in: linux (Ubuntu Vivid)
     Assignee: (unassigned) => Joseph Salisbury (jsalisbury)

** Changed in: linux (Ubuntu Xenial)
       Status: In Progress => Incomplete

** Changed in: linux (Ubuntu Wily)
       Status: In Progress => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1484919

Title:
  Kernel oops associated with BIRD/netlink

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Vivid:
  In Progress
Status in linux source package in Wily:
  Incomplete
Status in linux source package in Xenial:
  Incomplete

Bug description:
  Scale testing our product, which uses the BIRD BGP daemon, on Google's
  GCE cloud, we see frequent (40% of hosts) Kernel Oopses and reboots on
  kernel 3.19.0-25-generic #26~14.04.1-Ubuntu with BIRD running.  This
  is the standard GCE-provided Ubuntu image.

  If we replace the image with a stock Ubuntu one (kernel
  3.13.0-61-generic #100-Ubuntu), installed from ISO, then we do not see
  the issue.

  If we stop BIRD then we no longer see the issue.

  I suspect that this is an issue with the way BIRD is using netlink,
  triggering a kernel bug.   It seems to happen more at scale, when BIRD
  is doing more with netlink and we have thousands of routes in place.

  Here's a sample kernel oops:

  [  266.033276] BUG: unable to handle kernel paging request at 000000190000003c
  [  266.035142] IP: [<ffffffff811d1f0b>] __kmalloc_node_track_caller+0xfb/0x2c0
  [  266.036009] PGD b9e5e067 PUD 0 
  [  266.036009] Oops: 0000 [#1] SMP 
  [  266.036009] Modules linked in: bridge stp llc dummy xt_mac xt_mark nf_conntrack_ipv6 nf_defrag_ipv6 xt_comment xt_set ip_set_hash_ip ip_set nfnetlink ebtable_nat ebtables xt_nat ipip tunnel4 ip_tunnel ipt_REJECT nf_reject_ipv4 xt_conntrack xt_CHECKSUM xt_tcpudp iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables x_tables nbd ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_crypt ppdev dm_multipath scsi_dh 8250_fintek parport_pc i2c_piix4 mac_hid serio_raw parport crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  [  266.036009] CPU: 2 PID: 3456 Comm: bird Tainted: G         C     3.19.0-25-generic #26~14.04.1-Ubuntu
  [  266.036009] Hardware name: Google Google, BIOS Google 01/01/2011
  [  266.036009] task: ffff8801210775c0 ti: ffff880036a08000 task.ti: ffff880036a08000
  [  266.036009] RIP: 0010:[<ffffffff811d1f0b>]  [<ffffffff811d1f0b>] __kmalloc_node_track_caller+0xfb/0x2c0
  [  266.036009] RSP: 0018:ffff880036a0b7f8  EFLAGS: 00010246
  [  266.036009] RAX: 0000000000000000 RBX: 00000000000102d0 RCX: 000000000008b0f6
  [  266.036009] RDX: 000000000008b0f5 RSI: 0000000000000000 RDI: 00000000000171c0
  [  266.036009] RBP: ffff880036a0b848 R08: ffff8801263171c0 R09: ffff880121c01600
  [  266.036009] R10: 0000000000000000 R11: ffff880121c01600 R12: 00000000000102d0
  [  266.036009] R13: 0000000000000180 R14: 00000000ffffffff R15: 000000190000003c
  [  266.036009] FS:  00007fd470753740(0000) GS:ffff880126300000(0000) knlGS:0000000000000000
  [  266.036009] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  266.036009] CR2: 000000190000003c CR3: 00000000bb207000 CR4: 00000000001406e0
  [  266.036009] Stack:
  [  266.036009]  0000000100000180 00000000000000c3 ffff880121c01600 ffffffff816992ea
  [  266.036009]  0000000000000001 ffff8800b1486a00 0000000000000000 00000000000000d0
  [  266.036009]  0000000000000180 00000000ffffffff ffff880036a0b898 ffffffff81697261
  [  266.036009] Call Trace:
  [  266.036009]  [<ffffffff816992ea>] ? pskb_expand_head+0x6a/0x260
  [  266.036009]  [<ffffffff81697261>] __kmalloc_reserve.isra.27+0x31/0x90
  [  266.036009]  [<ffffffff816992ea>] pskb_expand_head+0x6a/0x260
  [  266.036009]  [<ffffffff816d6d13>] netlink_trim+0xa3/0xe0
  [  266.036009]  [<ffffffff816d984e>] netlink_unicast+0x3e/0x200
  [  266.036009]  [<ffffffff816da323>] nlmsg_notify+0x93/0xb0
  [  266.036009]  [<ffffffff816b8d3e>] rtnl_notify+0x2e/0x40
  [  266.036009]  [<ffffffff81727525>] rtmsg_fib+0x115/0x160
  [  266.036009]  [<ffffffff8172a09d>] ? trie_rebalance+0x10d/0x130
  [  266.036009]  [<ffffffff8172a34a>] fib_table_insert+0x1da/0x8e0
  [  266.036009]  [<ffffffff817242a8>] inet_rtm_newroute+0x48/0x60
  [  266.036009]  [<ffffffff816b97c5>] rtnetlink_rcv_msg+0x95/0x250
  [  266.036009]  [<ffffffff813bb4a6>] ? rhashtable_lookup_compare+0x36/0x70
  [  266.036009]  [<ffffffff816d631e>] ? __netlink_lookup+0x3e/0x50
  [  266.036009]  [<ffffffff816b9730>] ? rtnetlink_rcv+0x40/0x40
  [  266.036009]  [<ffffffff816da271>] netlink_rcv_skb+0xc1/0xe0
  [  266.036009]  [<ffffffff816b971c>] rtnetlink_rcv+0x2c/0x40
  [  266.036009]  [<ffffffff816d9906>] netlink_unicast+0xf6/0x200
  [  266.036009]  [<ffffffff816d9d1c>] netlink_sendmsg+0x30c/0x680
  [  266.036009]  [<ffffffff81351610>] ? aa_sk_perm.isra.4+0x70/0x150
  [  266.036009]  [<ffffffff8168f2ec>] do_sock_sendmsg+0x8c/0x100
  [  266.036009]  [<ffffffff81209a13>] ? __fdget+0x13/0x20
  [  266.036009]  [<ffffffff8168f547>] SYSC_sendto+0x157/0x200
  [  266.036009]  [<ffffffff81690252>] ? __sys_recvmsg+0x42/0x80
  [  266.036009]  [<ffffffff8168fd2e>] SyS_sendto+0xe/0x10
  [  266.036009]  [<ffffffff817b668d>] system_call_fastpath+0x16/0x1b
  [  266.036009] Code: fb 41 8b 53 18 0f 1f 44 00 00 48 83 c4 28 48 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 40 00 49 63 41 20 48 8d 4a 01 49 8b 39 <49> 8b 1c 07 4c 89 f8 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 53 ff 
  [  266.036009] RIP  [<ffffffff811d1f0b>] __kmalloc_node_track_caller+0xfb/0x2c0
  [  266.036009]  RSP <ffff880036a0b7f8>
  [  266.036009] CR2: 000000190000003c
  [  266.131166] ---[ end trace 246ae06038901786 ]---

  Running our product on CoreOS, we see similar, but less frequent
  crashes.  Their kernel is 4.1-based:
  https://github.com/coreos/bugs/issues/435

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1484919/+subscriptions


References