kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #136443
[Bug 1497048] Re: Trusty - Null pointer dereference at queue_userspace_packet+0x1f/0x2d0 [openvswitch]
** Also affects: linux (Ubuntu Trusty)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu)
Status: In Progress => Invalid
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1497048
Title:
Trusty - Null pointer dereference at queue_userspace_packet+0x1f/0x2d0
[openvswitch]
Status in linux package in Ubuntu:
Invalid
Status in linux source package in Trusty:
New
Bug description:
[Impact]
* With certain complicated network configurations as occur in
Openstack clouds the kernel crashes with the below stack trace.
* We have observed kernel panics when an openvswitch bridge is
populated with virtual devices (veth, for example) that have expansive
feature sets that include NETIF_F_GSO_GRE.
The failure occurs when foreign GRE encapsulated traffic
(explicitly not including the initial packets of a connection) arrives at
the system (likely via a switch flood event). The packets are GRO
accumulated, and passed to the OVS receive processing. As the connection
is not in the OVS kernel datapath table, the call path is:
ovs_dp_upcall ->
queue_gso_packets ->
__skb_gso_segment(skb, NETIF_F_SG, false)
Without 1e16aa3ddf863c6b9f37eddf52503230a62dedb3, __skb_gso_segment
returns NULL,as the features from the device (including _GSO_GRE) are
used in place of the _SG feature supplied to the call. The kernel
panics on a subsequent dereference of the NULL pointer in
queue_userspace_packet().
[Test Case]
* We have no easy reproduce procedure.
[Regression Potential]
* Both patches are pulled from upstream, but not accepted nor rejected as stable patches.
Stable threads
http://marc.info/?l=linux-netdev&m=143631594021618&w=2
http://marc.info/?l=linux-netdev&m=143951671004053&w=2
* This patch has been in place in a large cloud where the issue used
to occur frequently now for 50 days without related incident.
[Other Info]
* 330966e501ffe282d7184fde4518d5e0c24bc7f8 is included as well, as it obviously avoids possible NULL dereferences in similar areas of code. As such we'd like to see both patches included.
________________________________________________________________________[415165.417433] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a3
[415165.417759] IP: [<ffffffffa015e24f>] queue_userspace_packet+0x1f/0x2d0 [openvswitch]
[415165.418073] PGD 0
[415165.418161] Oops: 0000 [#1] SMP
[415165.418299] Modules linked in: l2tp_eth l2tp_netlink l2tp_core vhost_net vhost macvtap macvlan xt_conntrack ipt_REJECT dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp ip6table_filter ip6_tables iptable_filter ip_tables x_tables nbd ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi openvswitch gre vxlan ip_tunnel dm_crypt gpio_ich dm_multipath bridge scsi_dh stp llc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel joydev kvm shpchp sb_edac ipmi_si edac_core acpi_power_meter lpc_ich mac_hid xfs btrfs xor raid6_pq libcrc32c ses enclosure hid_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
[415165.421570] aesni_intel ixgbe igb aes_x86_64 lrw dca gf128mul glue_helper ptp ablk_helper usbhid cryptd megaraid_sas pps_core hid mdio i2c_algo_bit wmi
[415165.427942] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.13.0-53-generic #89-Ubuntu
[415165.440183] Hardware name: Cisco Systems Inc UCSC-C240-M3S/UCSC-C240-M3S, BIOS C240M3.2.0.1a.0.042820140036 04/28/2014
[415165.452693] task: ffff882012d01800 ti: ffff882012cfc000 task.ti: ffff882012cfc000
[415165.465847] RIP: 0010:[<ffffffffa015e24f>] [<ffffffffa015e24f>] queue_userspace_packet+0x1f/0x2d0 [openvswitch]
[415165.480003] RSP: 0018:ffff88203fce3b88 EFLAGS: 00010296
[415165.487411] RAX: 0000000000000000 RBX: ffff88203fce3ce8 RCX: ffff88203fce3ce8
[415165.502430] RDX: 0000000000000000 RSI: 000000000000000e RDI: ffffffff81cdab00
[415165.517448] RBP: ffff88203fce3bc8 R08: 0000000000000001 R09: 0000000000000000
[415165.532701] R10: 0000000000410000 R11: 000000000f9365e3 R12: ffff88203fce3ce8
[415165.548698] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000000e
[415165.564653] FS: 0000000000000000(0000) GS:ffff88203fce0000(0000) knlGS:0000000000000000
[415165.580681] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[415165.588725] CR2: 00000000000000a3 CR3: 0000000001c0e000 CR4: 00000000000427e0
[415165.604495] Stack:
[415165.612127] ffffffff81d1ca68 ffff881fbd6c6c00 0000000000000009 0000000000000000
[415165.627360] ffff88203fce3ce8 0000000000000000 000000000000000e 0000000000000000
[415165.642642] ffff88203fce3cb8 ffffffffa015e5a1 0000000000000010 ffffffff81cdab00
[415165.657955] Call Trace:
[415165.665405] <IRQ>
[415165.665500]
[415165.672684] [<ffffffffa015e5a1>] queue_gso_packets+0xa1/0x1f0 [openvswitch]
[415165.680015] [<ffffffffa015de7b>] ? ovs_execute_actions+0x2b/0x30 [openvswitch]
[415165.694425] [<ffffffffa01607f5>] ovs_dp_upcall+0xe5/0xf0 [openvswitch]
[415165.701807] [<ffffffffa016090f>] ovs_dp_process_received_packet+0x10f/0x120 [openvswitch]
[415165.716228] [<ffffffffa0166aca>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[415165.723591] [<ffffffffa0167391>] netdev_frame_hook+0xc1/0x120 [openvswitch]
[415165.730799] [<ffffffff81626892>] __netif_receive_skb_core+0x262/0x840
[415165.737909] [<ffffffff81626e88>] __netif_receive_skb+0x18/0x60
[415165.744824] [<ffffffff81627a1e>] process_backlog+0xae/0x1a0
[415165.751644] [<ffffffff81627272>] net_rx_action+0x152/0x250
[415165.758248] [<ffffffff8106cc6c>] __do_softirq+0xec/0x2c0
[415165.764694] [<ffffffff8106d1b5>] irq_exit+0x105/0x110
[415165.770968] [<ffffffff81735c26>] do_IRQ+0x56/0xc0
[415165.777058] [<ffffffff8172b32d>] common_interrupt+0x6d/0x6d
[415165.783041] <EOI>
[415165.783127]
[415165.788840] [<ffffffff815d523f>] ? cpuidle_enter_state+0x4f/0xc0
[415165.794659] [<ffffffff815d5369>] cpuidle_idle_call+0xb9/0x1f0
[415165.800468] [<ffffffff8101d34e>] arch_cpu_idle+0xe/0x30
[415165.806126] [<ffffffff810bf0a5>] cpu_startup_entry+0xc5/0x290
[415165.811862] [<ffffffff810414dd>] start_secondary+0x21d/0x2d0
[415165.817479] Code: 32 74 04 48 89 71 08 5b 5d c3 66 90 66 66 66 66 90 55 48 89 e5 41 57 41 89 f7 41 56 49 89 d6 41 55 41 54 53 48 89 cb 48 83 ec 18 <f6> 82 a3 00 00 00 10 48 89 7d c8 48 c7 45 d0 00 00 00 00 0f 85
[415165.834611] RIP [<ffffffffa015e24f>] queue_userspace_packet+0x1f/0x2d0 [openvswitch]
[415165.845643] RSP <ffff88203fce3b88>
[415165.851171] CR2: 00000000000000a3
_________________________________________________________________________________________
After analysis we provided a 3.13 kernel patched with commit 1e16aa3ddf863c6b9f37eddf52503230a62dedb3 and
330966e501ffe282d7184fde4518d5e0c24bc7f8. As a result the fairly consistent crash is no longer occuring.
We attempted to push the patch through the stable process here
http://marc.info/?l=linux-netdev&m=143631594021618&w=2
and again
http://marc.info/?l=linux-netdev&m=143951671004053&w=2
Unfortunately upstream stable has yet to accept these upstream.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1497048/+subscriptions
References