kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #159015
[Bug 1526946] Re: Surelock GA2: Kernel panic with GA candidate driver, warning at kernel/rcu/tree.c:2694
Fix released in 4.2.0-27.32
** Changed in: linux (Ubuntu Wily)
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1526946
Title:
Surelock GA2: Kernel panic with GA candidate driver, warning at
kernel/rcu/tree.c:2694
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Wily:
Fix Released
Status in linux source package in Xenial:
Fix Released
Bug description:
-- Problem Description --
System was loaded (Ubuntu 15.10 base (Linux z1391 4.2.0-16-generic #19-Ubuntu SMP Thu Oct 8 14:49:47 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux)
Kernel panic running Hardware Test Exerciser test suite
[ 8841.280827] Unable to handle kernel paging request for data at address 0x00100108
[ 8841.280873] Faulting instruction address: 0xc000000000981994
[ 8841.280902] Oops: Kernel access of bad area, sig: 11 [#1]
[ 8841.280932] SMP NR_CPUS=2048 NUMA PowerNV
[ 8841.281034] Modules linked in: iptable_filter ip_tables x_tables uio_pdrv_genirq uio powernv_rng sunrpc autofs4 ses enclosure cxlflash bnx2x ipr cxl mdio libcrc32c
[ 8841.281055] CPU: 71 PID: 63157 Comm: hxecpu Not tainted 4.2.0-16-generic #19-Ubuntu
[ 8841.281065] task: c000001e252fa440 ti: c00000305658c000 task.ti: c00000305658c000
[ 8841.281077] NIP: c000000000981994 LR: c000000000981984 CTR: c000000000981940
[ 8841.281086] REGS: c00000305658f920 TRAP: 0300 Not tainted (4.2.0-16-generic)
[ 8841.281194] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 39139953 XER: a0000000
[ 8841.281473] CFAR: c000000000008468 DAR: 0000000000100108 DSISR: 42000000 SOFTE: 1
[ 8841.281473] GPR00: c000000000981984 c00000305658fba0 c00000000151ae00 c000001ff69de300
[ 8841.281473] GPR04: 0000000000000101 fffffffffffffec0 c00000000093f168 000000000000000a
[ 8841.281473] GPR08: 0000000000000100 0000000000200200 0000000000100100 0000000000000005
[ 8841.281473] GPR12: c000000000981940 c00000000fb6a280 0000000000000000 0000000000000001
[ 8841.281473] GPR16: 0000000000000000 c000000001431280 c000000000ad3988 7fffffffffffffff
[ 8841.281473] GPR20: 0000000000000000 c000001fcd2bb100 c00000305658c000 c000000001429b80
[ 8841.281473] GPR24: 000000000000000a 0000000000000000 c000001ff59ddb30 0000000000000001
[ 8841.281473] GPR28: c0000035fb079f00 c00000305658c000 c000001ff69de300 c0000035fb070f00
[ 8841.281514] NIP [c000000000981994] ipv4_dst_destroy+0x54/0xa0
[ 8841.281530] LR [c000000000981984] ipv4_dst_destroy+0x44/0xa0
[ 8841.281537] Call Trace:
[ 8841.281561] [c00000305658fba0] [c000000000981984] ipv4_dst_destroy+0x44/0xa0 (unreliable)
[ 8841.281585] [c00000305658fbd0] [c00000000093f120] dst_destroy+0xf0/0x1a0
[ 8841.281631] [c00000305658fc10] [c00000000093f4a8] dst_destroy_rcu+0x28/0x50
[ 8841.281668] [c00000305658fc40] [c00000000013a020] rcu_process_callbacks+0x340/0x6f0
[ 8841.281692] [c00000305658fcf0] [c0000000000baef8] __do_softirq+0x188/0x3a0
[ 8841.281709] [c00000305658fde0] [c0000000000bb388] irq_exit+0xc8/0x100
[ 8841.281727] [c00000305658fe00] [c00000000001f734] timer_interrupt+0xa4/0xe0
[ 8841.281751] [c00000305658fe30] [c000000000002714] decrementer_common+0x114/0x180
[ 8841.281762] Instruction dump:
[ 8841.281812] 60000000 e93f00b0 395f00b0 7fa95000 419e0048 ebdf00c0 7fc3f378 48116d99
[ 8841.281840] 60000000 e93f00b8 e95f00b0 7fc3f378 <f92a0008> f9490000 3d200010 61290100
[ 8841.281877] ---[ end trace 38ca99d7d89c7bef ]---
[ 8841.303699]
[ 8841.307282] WARNING: at /build/linux-sBmKia/linux-4.2.0/kernel/rcu/tree.c:2694
[ 8841.307377] Modules linked in: iptable_filter ip_tables x_tables uio_pdrv_genirq uio powernv_rng sunrpc autofs4 ses enclosure cxlflash bnx2x ipr cxl mdio libcrc32c
[ 8841.307701] CPU: 135 PID: 687 Comm: ksoftirqd/135 Tainted: G D 4.2.0-16-generic #19-Ubuntu
[ 8841.307834] task: c000003ca16b64b0 ti: c000003ca18e4000 task.ti: c000003ca18e4000
[ 8841.307939] NIP: c00000000013a29c LR: c00000000013a020 CTR: c000000000981940
[ 8841.308042] REGS: c000003ca18e78e0 TRAP: 0700 Tainted: G D (4.2.0-16-generic)
[ 8841.308155] MSR: 9000000100029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28022428 XER: 20000000
[ 8841.308427] CFAR: c00000000013a0e0 SOFTE: 0
GPR00: c00000000013a020 c000003ca18e7b60 c00000000151ae00 0000000000000001
GPR04: f000000007f34ac0 c000001fcd2bb100 0000000000000000 0000000000000003
GPR08: 0000000000000000 0000000000000001 c00000000149ae00 d00000001dc54ef0 6
GPR12: c000000000981940 c00000000fb90280 c0000000000e20b8 0000000000000001
GPR16: 0000000000000000 c000000001431280 c000000000ad3988 7fffffffffffffff
GPR20: 0000000000000000 c0000035fb07cb00 c000003ca18e4000 c000000001429b80
GPR24: 000000000000000a 0000000000000000 c000001ff69ddb30 0000000000000001
GPR28: 0000000000000000 0000000000000000 c000000001421280 c000001ff69ddb00
[ 8841.310226] NIP [c00000000013a29c] rcu_process_callbacks+0x5bc/0x6f0
[ 8841.310303] LR [c00440000013a020] 2cu_process_callbacks+0x340/0x6f0
[ 8841.310551] Call Trace:
[ 8841.310680] [c000003ca18e7b60] [c00000000013a020] rcu_process_callbacks+0x340/0x6f0 (unreliable) 6:45 6 1
[ 8841.311167] [c000003ca18e7c10] [c0000000000baef8] __do_softirq+0x188/0x3a0
[ 8841.311515] [c0000045a18e7d00] [c1000000000bb154] run_ksoftirqd+0x44/0xb0
[ 8841.311838] [c000003ca18e7d20] [c0000000000e7a10] smpboot_thread_fn+0x280/0x290
[ 8841.312229] [c000003ca18e7d80] [c0000000000e21c0] kthread+0x145/0x130
[ 8841.312557] [c000003ca18e7e30] [c000000000009538] ret_from_ker6el_threa9 1/0xa4 5
[ 8841.312789] Instruction dump:
[ 8841.312880] 409dfe5c e95f0030 f93f0088 7d290074 7d4a0074 7929d682 794ad182 7fa95000
[ 8841.313246] 419efe58 3d42fff8 892a494a 69290001 <0b090000> 2fa60000 41f7 199ea494a 5
[ 8841.313552] ---[ end trace 38ca99d7d89c7bf0 ]---
[ 8843.304038] Kernel panic - not syncing: Fatal exception in interrupt
> Just some random notes:
>
> 0x00100108 is 1M + 256 + 8. Possibly a pointer reference and use-after-free?
>
I think this is poisoning:
#define LIST_POISON1 ((void *) 0x00100100 + POISON_POINTER_DELTA)
> [ 8841.281514] NIP [c000000000981994] ipv4_dst_destroy+0x54/0xa0
>
> [ 8841.281530] LR [c000000000981984] ipv4_dst_destroy+0x44/0xa0
>
> anyone know what line that corresponds to?
>
greg@prato:~$ addr2line -e /usr/lib/debug/boot/vmlinux-4.2.0-16-generic c000000000981994
/build/linux-sBmKia/linux-4.2.0/include/linux/list.h:89
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
==> next->prev = prev;
prev->next = next;
}
> upstream, at least;
>
> static void ipv4_dst_destroy(struct dst_entry *dst)
> {
> struct rtable *rt = (struct rtable *) dst;
>
> if (!list_empty(&rt->rt_uncached)) {
> struct uncached_list *ul = rt->rt_uncached_list;
>
> spin_lock_bh(&ul->lock);
> list_del(&rt->rt_uncached);
> spin_unlock_bh(&ul->lock);
> }
> }
so it looks like we're trying to remove the same object twice...
Below are the stack traces of the two threads that have both called
dst_release() against the same struct dst_entry, resulting in a double
free and eventual crash. Both threads are in tcp_v4_do_rcv()
processing skb=c000001e28ca5a00 on sock c000001d2bda0000.
-----------------------------
[172134.583029] tcp_v4_do_rcv: sk=c000001d2bda0000 skb=c000001e28ca5a00 dst=c000001e28caf000 sk->sk_rx_dst=c000001e28caf000
[172134.583075] dst_release: dst=c000001e28caf000
[172134.583154] CPU: 51 PID: 65452 Comm: hxecpu Tainted: G W 4.2.3 #2
[172134.583158] Call Trace:
[172134.583190] [c000001fffe075a0] [c000000000a9dcd4] dump_stack+0x90/0xbc (unreliable)
[172134.583227] [c000001fffe075d0] [c00000000093bf80] dst_release+0x110/0x120
[172134.583260] [c000001fffe07640] [c0000000009b4024] tcp_v4_do_rcv+0x4d4/0x4f0
[172134.583277] [c000001fffe076e0] [c0000000009b7834] tcp_v4_rcv+0xb74/0xb90
[172134.583377] [c000001fffe077c0] [c000000000984bb8] ip_local_deliver_finish+0x178/0x350
[172134.583447] [c000001fffe07810] [c0000000009853bc] ip_local_deliver+0x4c/0x120
[172134.583469] [c000001fffe07880] [c000000000984eb4] ip_rcv_finish+0x124/0x420
[172134.583517] [c000001fffe07900] [c0000000009857a4] ip_rcv+0x314/0x440
[172134.583551] [c000001fffe07990] [c00000000092b094] __netif_receive_skb_core+0xa14/0xd60
[172134.583646] [c000001fffe07a70] [c00000000092e924] netif_receive_skb_internal+0x34/0xd0
[172134.583715] [c000001fffe07ab0] [c00000000092fa5c] napi_gro_receive+0xec/0x1b0
[172134.583796] [c000001fffe07af0] [d0000000166d59f0] bnx2x_rx_int+0x1450/0x1700 [bnx2x]
[172134.583928] [c000001fffe07c90] [d0000000166d6580] bnx2x_poll+0x310/0x440 [bnx2x]
[172134.583974] [c000001fffe07d40] [c00000000092f0dc] net_rx_action+0x2dc/0x470
[172134.583990] [c000001fffe07e50] [c0000000000ba7d8] __do_softirq+0x188/0x3a0
[172134.583997] [c000001fffe07f40] [c0000000000bac68] irq_exit+0xc8/0x100
[172134.584003] [c000001fffe07f60] [c0000000000111bc] __do_irq+0x8c/0x190
[172134.584008] [c000001fffe07f90] [c000000000024290] call_do_irq+0x14/0x24
[172134.584011] [c000001e2d57fde0] [c000000000011358] do_IRQ+0x98/0x140
[172134.584017] [c000001e2d57fe30] [c000000000002594] hardware_interrupt_common+0x114/0x180=
-------------------------------------------------------------------------------
[172134.584065] tcp_v4_do_rcv: sk=c000001d2bda0000 skb=c000001e28ca5a00 dst=c000001e28caf000 sk->sk_rx_dst=c000001e28caf000
[172134.584081] dst_release: dst=c000001e28caf000
[172134.584100] CPU: 50 PID: 22055 Comm: hxecom Tainted: G W 4.2.3 #2
[172134.584101] Call Trace:
[172134.584124] [c000001e0befb920] [c000000000a9dcd4] dump_stack+0x90/0xbc (unreliable)
[172134.584139] [c000001e0befb950] [c00000000093bf80] dst_release+0x110/0x120
[172134.584147] [c000001e0befb9c0] [c0000000009b4024] tcp_v4_do_rcv+0x4d4/0x4f0
[172134.584160] [c000001e0befba60] [c00000000090c72c] release_sock+0xec/0x1e0
[172134.584174] [c000001e0befbab0] [c000000000998b98] tcp_recvmsg+0x3f8/0xce0
[172134.584183] [c000001e0befbbd0] [c0000000009d31dc] inet_recvmsg+0x9c/0x110
[172134.584188] [c000001e0befbc30] [c000000000906d24] sock_recvmsg+0x84/0xb0
[172134.584192] [c000001e0befbc70] [c0000000009088ac] SyS_recvfrom+0xdc/0x1a0
[172134.584198] [c000001e0befbdc0] [c000000000909a38] SyS_socketcall+0x2d8/0x430
[172134.584216] [c000001e0befbe30] [c000000000009204] system_call+0x38/0xb4
--------------------------------------------------------------------------------------------------------------------------------
The first thread runs tcp_v4_do_rcv() which calls
dst_release(sk->sk_rx_dst) and sets sk->sk_rx_dst=NULL. The last
reference on the dst has been dropped therefor dst_destroy() is
scheduled to run as a RCU call back. The following printk confirms
that dst_destroy() has been schedule to run:
[172134.583075] dst_release: dst=c000001e28caf000
tcp_v4_do_rcv() then calls tcp_rcv_established().
tcp_rcv_established() finding sk->sk_rx_dst==NULL sets it to the
dst_entry pointed to by skb->_skb_refdst and calls dst_hold() to
increment the entry's reference count.
My theory is that the same dst_entry that was release in
tcp_v4_do_rcv() has now been referenced again in
tcp_rcv_established(). This is bad as the rcu callback is already
scheduled to run on the dst_entry that has been referenced. The skb's
dst (skb->_skb_refdst) must have been set without taking a reference,
this can occur from skb_dst_set_noref() assumes rcu_read_lock_held()
or rcu_read_lock_bh_held(). The conditions for the double free is now
established.
The second thread following the same path calls dst_release()
scheduling a call back to destroy the dst. Sometime later when the
second callback runs the crash occurs.
I am testing my theory by adding the following check in
tcp_v4_do_rcv() to prevent the release of the dst when this condition
exists.
+if ( dst != skb_dst(skb)) {
dst_release(dst);
sk->sk_rx_dst = NULL;
+}
This has run for 24 hours with out failure. I will let it run a little
longer to confirm.
> Latest test patch from the community
>
> This test patch is based on discussions with the community (net-dev). I
> have high confidence that the changes to net_sk_rx_dst_set() will address
> the crashes we are seeing.
>
> We will make a test run with this patch over the weekend.
With the patch applied the test ran over the weekend with out a
failure..
> This problem has been resolved and tested with the following patch. The
> patch has been submitted to netdev@xxxxxxxxxxxxxxx
>
> http://lists.openwall.net/netdev/2015/12/14/177
Canonical, please pick up the following kernel patch:
http://lists.openwall.net/netdev/2015/12/14/177
The patch was accepted on net-dev:
http://lists.openwall.net/netdev/2015/12/15/9
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1526946/+subscriptions