kernel-packages team mailing list archive

Thread
Date
[Bug 1497048] Re: Trusty - Null pointer dereference at queue_userspace_packet+0x1f/0x2d0 [openvswitch]

To: kernel-packages@xxxxxxxxxxxxxxxxxxx
From: Mathew Hodson <mathew.hodson@xxxxxxxxx>
Date: Thu, 15 Oct 2015 09:32:51 -0000
Reply-to: Bug 1497048 <1497048@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
** Changed in: linux (Ubuntu Trusty)
    Milestone: None => trusty-updates

** Changed in: linux (Ubuntu)
    Milestone: trusty-updates => None

** Changed in: linux (Ubuntu Trusty)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1497048

Title:
  Trusty - Null pointer dereference at queue_userspace_packet+0x1f/0x2d0
  [openvswitch]

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  [Impact]

   * With certain complicated network configurations as occur in
  Openstack clouds the kernel crashes with the below stack trace.

   * We have observed kernel panics when an openvswitch bridge is
  populated with virtual devices (veth, for example) that have expansive
  feature sets that include NETIF_F_GSO_GRE.

  The failure occurs when foreign GRE encapsulated traffic
  (explicitly not including the initial packets of a connection) arrives at
  the system (likely via a switch flood event).  The packets are GRO
  accumulated, and passed to the OVS receive processing.  As the connection
  is not in the OVS kernel datapath table, the call path is:

  ovs_dp_upcall ->
  	queue_gso_packets ->
  		__skb_gso_segment(skb, NETIF_F_SG, false)

  Without 1e16aa3ddf863c6b9f37eddf52503230a62dedb3, __skb_gso_segment
  returns NULL,as the features from the device (including _GSO_GRE) are
  used in place of the _SG feature supplied to the call.  The kernel
  panics on a subsequent dereference of the NULL pointer in
  queue_userspace_packet().

  
  [Test Case]

   * We have no easy reproduce procedure.

  [Regression Potential]

   * Both patches are pulled from upstream, but not accepted nor rejected as stable patches.
  Stable threads 
  http://marc.info/?l=linux-netdev&m=143631594021618&w=2
  http://marc.info/?l=linux-netdev&m=143951671004053&w=2

   * This patch has been in place in a large cloud where the issue used
  to occur frequently now for 50 days without related incident.

  [Other Info]
   
   * 330966e501ffe282d7184fde4518d5e0c24bc7f8 is included as well, as it obviously avoids possible NULL dereferences in similar areas of code.  As such we'd like to see both patches included.  
  ________________________________________________________________________[415165.417433] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a3
  [415165.417759] IP: [<ffffffffa015e24f>] queue_userspace_packet+0x1f/0x2d0 [openvswitch]
  [415165.418073] PGD 0
  [415165.418161] Oops: 0000 [#1] SMP
  [415165.418299] Modules linked in: l2tp_eth l2tp_netlink l2tp_core vhost_net vhost macvtap macvlan xt_conntrack ipt_REJECT dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp ip6table_filter ip6_tables iptable_filter ip_tables x_tables nbd ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi openvswitch gre vxlan ip_tunnel dm_crypt gpio_ich dm_multipath bridge scsi_dh stp llc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel joydev kvm shpchp sb_edac ipmi_si edac_core acpi_power_meter lpc_ich mac_hid xfs btrfs xor raid6_pq libcrc32c ses enclosure hid_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
  [415165.421570]  aesni_intel ixgbe igb aes_x86_64 lrw dca gf128mul glue_helper ptp ablk_helper usbhid cryptd megaraid_sas pps_core hid mdio i2c_algo_bit wmi
  [415165.427942] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.13.0-53-generic #89-Ubuntu
  [415165.440183] Hardware name: Cisco Systems Inc UCSC-C240-M3S/UCSC-C240-M3S, BIOS C240M3.2.0.1a.0.042820140036 04/28/2014
  [415165.452693] task: ffff882012d01800 ti: ffff882012cfc000 task.ti: ffff882012cfc000
  [415165.465847] RIP: 0010:[<ffffffffa015e24f>]  [<ffffffffa015e24f>] queue_userspace_packet+0x1f/0x2d0 [openvswitch]
  [415165.480003] RSP: 0018:ffff88203fce3b88  EFLAGS: 00010296
  [415165.487411] RAX: 0000000000000000 RBX: ffff88203fce3ce8 RCX: ffff88203fce3ce8
  [415165.502430] RDX: 0000000000000000 RSI: 000000000000000e RDI: ffffffff81cdab00
  [415165.517448] RBP: ffff88203fce3bc8 R08: 0000000000000001 R09: 0000000000000000
  [415165.532701] R10: 0000000000410000 R11: 000000000f9365e3 R12: ffff88203fce3ce8
  [415165.548698] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000000e
  [415165.564653] FS:  0000000000000000(0000) GS:ffff88203fce0000(0000) knlGS:0000000000000000
  [415165.580681] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [415165.588725] CR2: 00000000000000a3 CR3: 0000000001c0e000 CR4: 00000000000427e0
  [415165.604495] Stack:
  [415165.612127]  ffffffff81d1ca68 ffff881fbd6c6c00 0000000000000009 0000000000000000
  [415165.627360]  ffff88203fce3ce8 0000000000000000 000000000000000e 0000000000000000
  [415165.642642]  ffff88203fce3cb8 ffffffffa015e5a1 0000000000000010 ffffffff81cdab00
  [415165.657955] Call Trace:
  [415165.665405]  <IRQ>
  [415165.665500]
  [415165.672684]  [<ffffffffa015e5a1>] queue_gso_packets+0xa1/0x1f0 [openvswitch]
  [415165.680015]  [<ffffffffa015de7b>] ? ovs_execute_actions+0x2b/0x30 [openvswitch]
  [415165.694425]  [<ffffffffa01607f5>] ovs_dp_upcall+0xe5/0xf0 [openvswitch]
  [415165.701807]  [<ffffffffa016090f>] ovs_dp_process_received_packet+0x10f/0x120 [openvswitch]
  [415165.716228]  [<ffffffffa0166aca>] ovs_vport_receive+0x2a/0x30 [openvswitch]
  [415165.723591]  [<ffffffffa0167391>] netdev_frame_hook+0xc1/0x120 [openvswitch]
  [415165.730799]  [<ffffffff81626892>] __netif_receive_skb_core+0x262/0x840
  [415165.737909]  [<ffffffff81626e88>] __netif_receive_skb+0x18/0x60
  [415165.744824]  [<ffffffff81627a1e>] process_backlog+0xae/0x1a0
  [415165.751644]  [<ffffffff81627272>] net_rx_action+0x152/0x250
  [415165.758248]  [<ffffffff8106cc6c>] __do_softirq+0xec/0x2c0
  [415165.764694]  [<ffffffff8106d1b5>] irq_exit+0x105/0x110
  [415165.770968]  [<ffffffff81735c26>] do_IRQ+0x56/0xc0
  [415165.777058]  [<ffffffff8172b32d>] common_interrupt+0x6d/0x6d
  [415165.783041]  <EOI>
  [415165.783127]
  [415165.788840]  [<ffffffff815d523f>] ? cpuidle_enter_state+0x4f/0xc0
  [415165.794659]  [<ffffffff815d5369>] cpuidle_idle_call+0xb9/0x1f0
  [415165.800468]  [<ffffffff8101d34e>] arch_cpu_idle+0xe/0x30
  [415165.806126]  [<ffffffff810bf0a5>] cpu_startup_entry+0xc5/0x290
  [415165.811862]  [<ffffffff810414dd>] start_secondary+0x21d/0x2d0
  [415165.817479] Code: 32 74 04 48 89 71 08 5b 5d c3 66 90 66 66 66 66 90 55 48 89 e5 41 57 41 89 f7 41 56 49 89 d6 41 55 41 54 53 48 89 cb 48 83 ec 18 <f6> 82 a3 00 00 00 10 48 89 7d c8 48 c7 45 d0 00 00 00 00 0f 85
  [415165.834611] RIP  [<ffffffffa015e24f>] queue_userspace_packet+0x1f/0x2d0 [openvswitch]
  [415165.845643]  RSP <ffff88203fce3b88>
  [415165.851171] CR2: 00000000000000a3
  _________________________________________________________________________________________

  After analysis we provided a 3.13 kernel patched with commit 1e16aa3ddf863c6b9f37eddf52503230a62dedb3 and
  330966e501ffe282d7184fde4518d5e0c24bc7f8.  As a result the fairly consistent crash is no longer occuring.

  We attempted to push the patch through the stable process here
  http://marc.info/?l=linux-netdev&m=143631594021618&w=2
  and again
  http://marc.info/?l=linux-netdev&m=143951671004053&w=2
  Unfortunately upstream stable has yet to accept these upstream.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1497048/+subscriptions
References

[Bug 1497048] [NEW] Trusty - Null pointer dereference at queue_userspace_packet+0x1f/0x2d0 [openvswitch]
From: Dave Chiluk, 2015-09-17