kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #156166
[Bug 1526946] Re: Surelock GA2: Kernel panic with GA candidate driver, warning at kernel/rcu/tree.c:2694
on comment number #2, it says that the kernel is at proposed, but the
bug has not marked as fixed release. It is marked as Fix committed. It
seems that the patches were not integrated in the -proposed kernel
version.
I am wondering if comment#2 was a mistake, somehow.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1526946
Title:
Surelock GA2: Kernel panic with GA candidate driver, warning at
kernel/rcu/tree.c:2694
Status in linux package in Ubuntu:
Fix Committed
Status in linux source package in Wily:
Fix Committed
Status in linux source package in Xenial:
Fix Committed
Bug description:
-- Problem Description --
System was loaded (Ubuntu 15.10 base (Linux z1391 4.2.0-16-generic #19-Ubuntu SMP Thu Oct 8 14:49:47 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux)
Kernel panic running Hardware Test Exerciser test suite
[ 8841.280827] Unable to handle kernel paging request for data at address 0x00100108
[ 8841.280873] Faulting instruction address: 0xc000000000981994
[ 8841.280902] Oops: Kernel access of bad area, sig: 11 [#1]
[ 8841.280932] SMP NR_CPUS=2048 NUMA PowerNV
[ 8841.281034] Modules linked in: iptable_filter ip_tables x_tables uio_pdrv_genirq uio powernv_rng sunrpc autofs4 ses enclosure cxlflash bnx2x ipr cxl mdio libcrc32c
[ 8841.281055] CPU: 71 PID: 63157 Comm: hxecpu Not tainted 4.2.0-16-generic #19-Ubuntu
[ 8841.281065] task: c000001e252fa440 ti: c00000305658c000 task.ti: c00000305658c000
[ 8841.281077] NIP: c000000000981994 LR: c000000000981984 CTR: c000000000981940
[ 8841.281086] REGS: c00000305658f920 TRAP: 0300 Not tainted (4.2.0-16-generic)
[ 8841.281194] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 39139953 XER: a0000000
[ 8841.281473] CFAR: c000000000008468 DAR: 0000000000100108 DSISR: 42000000 SOFTE: 1
[ 8841.281473] GPR00: c000000000981984 c00000305658fba0 c00000000151ae00 c000001ff69de300
[ 8841.281473] GPR04: 0000000000000101 fffffffffffffec0 c00000000093f168 000000000000000a
[ 8841.281473] GPR08: 0000000000000100 0000000000200200 0000000000100100 0000000000000005
[ 8841.281473] GPR12: c000000000981940 c00000000fb6a280 0000000000000000 0000000000000001
[ 8841.281473] GPR16: 0000000000000000 c000000001431280 c000000000ad3988 7fffffffffffffff
[ 8841.281473] GPR20: 0000000000000000 c000001fcd2bb100 c00000305658c000 c000000001429b80
[ 8841.281473] GPR24: 000000000000000a 0000000000000000 c000001ff59ddb30 0000000000000001
[ 8841.281473] GPR28: c0000035fb079f00 c00000305658c000 c000001ff69de300 c0000035fb070f00
[ 8841.281514] NIP [c000000000981994] ipv4_dst_destroy+0x54/0xa0
[ 8841.281530] LR [c000000000981984] ipv4_dst_destroy+0x44/0xa0
[ 8841.281537] Call Trace:
[ 8841.281561] [c00000305658fba0] [c000000000981984] ipv4_dst_destroy+0x44/0xa0 (unreliable)
[ 8841.281585] [c00000305658fbd0] [c00000000093f120] dst_destroy+0xf0/0x1a0
[ 8841.281631] [c00000305658fc10] [c00000000093f4a8] dst_destroy_rcu+0x28/0x50
[ 8841.281668] [c00000305658fc40] [c00000000013a020] rcu_process_callbacks+0x340/0x6f0
[ 8841.281692] [c00000305658fcf0] [c0000000000baef8] __do_softirq+0x188/0x3a0
[ 8841.281709] [c00000305658fde0] [c0000000000bb388] irq_exit+0xc8/0x100
[ 8841.281727] [c00000305658fe00] [c00000000001f734] timer_interrupt+0xa4/0xe0
[ 8841.281751] [c00000305658fe30] [c000000000002714] decrementer_common+0x114/0x180
[ 8841.281762] Instruction dump:
[ 8841.281812] 60000000 e93f00b0 395f00b0 7fa95000 419e0048 ebdf00c0 7fc3f378 48116d99
[ 8841.281840] 60000000 e93f00b8 e95f00b0 7fc3f378 <f92a0008> f9490000 3d200010 61290100
[ 8841.281877] ---[ end trace 38ca99d7d89c7bef ]---
[ 8841.303699]
[ 8841.307282] WARNING: at /build/linux-sBmKia/linux-4.2.0/kernel/rcu/tree.c:2694
[ 8841.307377] Modules linked in: iptable_filter ip_tables x_tables uio_pdrv_genirq uio powernv_rng sunrpc autofs4 ses enclosure cxlflash bnx2x ipr cxl mdio libcrc32c
[ 8841.307701] CPU: 135 PID: 687 Comm: ksoftirqd/135 Tainted: G D 4.2.0-16-generic #19-Ubuntu
[ 8841.307834] task: c000003ca16b64b0 ti: c000003ca18e4000 task.ti: c000003ca18e4000
[ 8841.307939] NIP: c00000000013a29c LR: c00000000013a020 CTR: c000000000981940
[ 8841.308042] REGS: c000003ca18e78e0 TRAP: 0700 Tainted: G D (4.2.0-16-generic)
[ 8841.308155] MSR: 9000000100029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28022428 XER: 20000000
[ 8841.308427] CFAR: c00000000013a0e0 SOFTE: 0
GPR00: c00000000013a020 c000003ca18e7b60 c00000000151ae00 0000000000000001
GPR04: f000000007f34ac0 c000001fcd2bb100 0000000000000000 0000000000000003
GPR08: 0000000000000000 0000000000000001 c00000000149ae00 d00000001dc54ef0 6
GPR12: c000000000981940 c00000000fb90280 c0000000000e20b8 0000000000000001
GPR16: 0000000000000000 c000000001431280 c000000000ad3988 7fffffffffffffff
GPR20: 0000000000000000 c0000035fb07cb00 c000003ca18e4000 c000000001429b80
GPR24: 000000000000000a 0000000000000000 c000001ff69ddb30 0000000000000001
GPR28: 0000000000000000 0000000000000000 c000000001421280 c000001ff69ddb00
[ 8841.310226] NIP [c00000000013a29c] rcu_process_callbacks+0x5bc/0x6f0
[ 8841.310303] LR [c00440000013a020] 2cu_process_callbacks+0x340/0x6f0
[ 8841.310551] Call Trace:
[ 8841.310680] [c000003ca18e7b60] [c00000000013a020] rcu_process_callbacks+0x340/0x6f0 (unreliable) 6:45 6 1
[ 8841.311167] [c000003ca18e7c10] [c0000000000baef8] __do_softirq+0x188/0x3a0
[ 8841.311515] [c0000045a18e7d00] [c1000000000bb154] run_ksoftirqd+0x44/0xb0
[ 8841.311838] [c000003ca18e7d20] [c0000000000e7a10] smpboot_thread_fn+0x280/0x290
[ 8841.312229] [c000003ca18e7d80] [c0000000000e21c0] kthread+0x145/0x130
[ 8841.312557] [c000003ca18e7e30] [c000000000009538] ret_from_ker6el_threa9 1/0xa4 5
[ 8841.312789] Instruction dump:
[ 8841.312880] 409dfe5c e95f0030 f93f0088 7d290074 7d4a0074 7929d682 794ad182 7fa95000
[ 8841.313246] 419efe58 3d42fff8 892a494a 69290001 <0b090000> 2fa60000 41f7 199ea494a 5
[ 8841.313552] ---[ end trace 38ca99d7d89c7bf0 ]---
[ 8843.304038] Kernel panic - not syncing: Fatal exception in interrupt
> Just some random notes:
>
> 0x00100108 is 1M + 256 + 8. Possibly a pointer reference and use-after-free?
>
I think this is poisoning:
#define LIST_POISON1 ((void *) 0x00100100 + POISON_POINTER_DELTA)
> [ 8841.281514] NIP [c000000000981994] ipv4_dst_destroy+0x54/0xa0
>
> [ 8841.281530] LR [c000000000981984] ipv4_dst_destroy+0x44/0xa0
>
> anyone know what line that corresponds to?
>
greg@prato:~$ addr2line -e /usr/lib/debug/boot/vmlinux-4.2.0-16-generic c000000000981994
/build/linux-sBmKia/linux-4.2.0/include/linux/list.h:89
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
==> next->prev = prev;
prev->next = next;
}
> upstream, at least;
>
> static void ipv4_dst_destroy(struct dst_entry *dst)
> {
> struct rtable *rt = (struct rtable *) dst;
>
> if (!list_empty(&rt->rt_uncached)) {
> struct uncached_list *ul = rt->rt_uncached_list;
>
> spin_lock_bh(&ul->lock);
> list_del(&rt->rt_uncached);
> spin_unlock_bh(&ul->lock);
> }
> }
so it looks like we're trying to remove the same object twice...
Below are the stack traces of the two threads that have both called
dst_release() against the same struct dst_entry, resulting in a double
free and eventual crash. Both threads are in tcp_v4_do_rcv()
processing skb=c000001e28ca5a00 on sock c000001d2bda0000.
-----------------------------
[172134.583029] tcp_v4_do_rcv: sk=c000001d2bda0000 skb=c000001e28ca5a00 dst=c000001e28caf000 sk->sk_rx_dst=c000001e28caf000
[172134.583075] dst_release: dst=c000001e28caf000
[172134.583154] CPU: 51 PID: 65452 Comm: hxecpu Tainted: G W 4.2.3 #2
[172134.583158] Call Trace:
[172134.583190] [c000001fffe075a0] [c000000000a9dcd4] dump_stack+0x90/0xbc (unreliable)
[172134.583227] [c000001fffe075d0] [c00000000093bf80] dst_release+0x110/0x120
[172134.583260] [c000001fffe07640] [c0000000009b4024] tcp_v4_do_rcv+0x4d4/0x4f0
[172134.583277] [c000001fffe076e0] [c0000000009b7834] tcp_v4_rcv+0xb74/0xb90
[172134.583377] [c000001fffe077c0] [c000000000984bb8] ip_local_deliver_finish+0x178/0x350
[172134.583447] [c000001fffe07810] [c0000000009853bc] ip_local_deliver+0x4c/0x120
[172134.583469] [c000001fffe07880] [c000000000984eb4] ip_rcv_finish+0x124/0x420
[172134.583517] [c000001fffe07900] [c0000000009857a4] ip_rcv+0x314/0x440
[172134.583551] [c000001fffe07990] [c00000000092b094] __netif_receive_skb_core+0xa14/0xd60
[172134.583646] [c000001fffe07a70] [c00000000092e924] netif_receive_skb_internal+0x34/0xd0
[172134.583715] [c000001fffe07ab0] [c00000000092fa5c] napi_gro_receive+0xec/0x1b0
[172134.583796] [c000001fffe07af0] [d0000000166d59f0] bnx2x_rx_int+0x1450/0x1700 [bnx2x]
[172134.583928] [c000001fffe07c90] [d0000000166d6580] bnx2x_poll+0x310/0x440 [bnx2x]
[172134.583974] [c000001fffe07d40] [c00000000092f0dc] net_rx_action+0x2dc/0x470
[172134.583990] [c000001fffe07e50] [c0000000000ba7d8] __do_softirq+0x188/0x3a0
[172134.583997] [c000001fffe07f40] [c0000000000bac68] irq_exit+0xc8/0x100
[172134.584003] [c000001fffe07f60] [c0000000000111bc] __do_irq+0x8c/0x190
[172134.584008] [c000001fffe07f90] [c000000000024290] call_do_irq+0x14/0x24
[172134.584011] [c000001e2d57fde0] [c000000000011358] do_IRQ+0x98/0x140
[172134.584017] [c000001e2d57fe30] [c000000000002594] hardware_interrupt_common+0x114/0x180=
-------------------------------------------------------------------------------
[172134.584065] tcp_v4_do_rcv: sk=c000001d2bda0000 skb=c000001e28ca5a00 dst=c000001e28caf000 sk->sk_rx_dst=c000001e28caf000
[172134.584081] dst_release: dst=c000001e28caf000
[172134.584100] CPU: 50 PID: 22055 Comm: hxecom Tainted: G W 4.2.3 #2
[172134.584101] Call Trace:
[172134.584124] [c000001e0befb920] [c000000000a9dcd4] dump_stack+0x90/0xbc (unreliable)
[172134.584139] [c000001e0befb950] [c00000000093bf80] dst_release+0x110/0x120
[172134.584147] [c000001e0befb9c0] [c0000000009b4024] tcp_v4_do_rcv+0x4d4/0x4f0
[172134.584160] [c000001e0befba60] [c00000000090c72c] release_sock+0xec/0x1e0
[172134.584174] [c000001e0befbab0] [c000000000998b98] tcp_recvmsg+0x3f8/0xce0
[172134.584183] [c000001e0befbbd0] [c0000000009d31dc] inet_recvmsg+0x9c/0x110
[172134.584188] [c000001e0befbc30] [c000000000906d24] sock_recvmsg+0x84/0xb0
[172134.584192] [c000001e0befbc70] [c0000000009088ac] SyS_recvfrom+0xdc/0x1a0
[172134.584198] [c000001e0befbdc0] [c000000000909a38] SyS_socketcall+0x2d8/0x430
[172134.584216] [c000001e0befbe30] [c000000000009204] system_call+0x38/0xb4
--------------------------------------------------------------------------------------------------------------------------------
The first thread runs tcp_v4_do_rcv() which calls
dst_release(sk->sk_rx_dst) and sets sk->sk_rx_dst=NULL. The last
reference on the dst has been dropped therefor dst_destroy() is
scheduled to run as a RCU call back. The following printk confirms
that dst_destroy() has been schedule to run:
[172134.583075] dst_release: dst=c000001e28caf000
tcp_v4_do_rcv() then calls tcp_rcv_established().
tcp_rcv_established() finding sk->sk_rx_dst==NULL sets it to the
dst_entry pointed to by skb->_skb_refdst and calls dst_hold() to
increment the entry's reference count.
My theory is that the same dst_entry that was release in
tcp_v4_do_rcv() has now been referenced again in
tcp_rcv_established(). This is bad as the rcu callback is already
scheduled to run on the dst_entry that has been referenced. The skb's
dst (skb->_skb_refdst) must have been set without taking a reference,
this can occur from skb_dst_set_noref() assumes rcu_read_lock_held()
or rcu_read_lock_bh_held(). The conditions for the double free is now
established.
The second thread following the same path calls dst_release()
scheduling a call back to destroy the dst. Sometime later when the
second callback runs the crash occurs.
I am testing my theory by adding the following check in
tcp_v4_do_rcv() to prevent the release of the dst when this condition
exists.
+if ( dst != skb_dst(skb)) {
dst_release(dst);
sk->sk_rx_dst = NULL;
+}
This has run for 24 hours with out failure. I will let it run a little
longer to confirm.
> Latest test patch from the community
>
> This test patch is based on discussions with the community (net-dev). I
> have high confidence that the changes to net_sk_rx_dst_set() will address
> the crashes we are seeing.
>
> We will make a test run with this patch over the weekend.
With the patch applied the test ran over the weekend with out a
failure..
> This problem has been resolved and tested with the following patch. The
> patch has been submitted to netdev@xxxxxxxxxxxxxxx
>
> http://lists.openwall.net/netdev/2015/12/14/177
Canonical, please pick up the following kernel patch:
http://lists.openwall.net/netdev/2015/12/14/177
The patch was accepted on net-dev:
http://lists.openwall.net/netdev/2015/12/15/9
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1526946/+subscriptions