← Back to team overview

kernel-packages team mailing list archive

[Bug 1344323] [NEW] Trusty kernel network performance regression

 

Public bug reported:

SRU Justification:

Impact:

Reduced TCP/IP receive performance for network devices that do not split
packet headers into skb linear area (e.g., mlx4).  The trusty kernel has
incorporated

commit eff44f9cc9a02aad53d568d3ae5020b6792ae4f6
Author: Jerry Chu <hkchu@xxxxxxxxxx>
Date:   Wed Dec 11 20:53:45 2013 -0800

    net-gro: Prepare GRO stack for the upcoming tunneling support

which modifies the GRO frag0 optimization, but unfortunately for some
cases results in calls to __skb_pull_tail for every packet being
received via the GRO path.  This causes a reduction in TCP receive
performance (or, more accurately, an increase in CPU load for TCP
receive processing, which will cause throughput reduction for CPU
limited workloads).

Fix:

This has already been fixed in mainline in

commit a50e233c50dbc881abaa0e4070789064e8d12d70
Author: Eric Dumazet <edumazet@xxxxxxxxxx>
Date:   Sat Mar 29 21:28:21 2014 -0700

    net-gro: restore frag0 optimization

The fix has been backported to and verified on the trusty kernel using
mlx4 devices and iperf; an increase from 7.5 to 8.5 Gb/sec was observed
when adding the patch, and the relevant portion of perf captures show
changes in the call paths from:

     7.17%            iperf  [kernel.kallsyms]   [k] __pskb_pull_tail                       
                      |
                      --- __pskb_pull_tail
                         |          
                         |--48.03%-- tcp_gro_receive
                         |          tcp4_gro_receive
                         |          inet_gro_receive
                         |          dev_gro_receive
                         |          napi_gro_frags
                         |          mlx4_en_process_rx_cq
                         |          mlx4_en_poll_rx_cq
                         |          net_rx_action
                         |          __do_softirq
[...]
                         |--28.53%-- napi_gro_frags
                         |          mlx4_en_process_rx_cq
                         |          mlx4_en_poll_rx_cq
                         |          net_rx_action
                         |          __do_softirq
[...]
                         |--13.11%-- inet_gro_receive
                         |          dev_gro_receive
                         |          napi_gro_frags
                         |          mlx4_en_process_rx_cq
                         |          mlx4_en_poll_rx_cq
                         |          net_rx_action
                         |          __do_softirq

to:

     4.87%          iperf  [kernel.kallsyms]   [k] skb_gro_receive                        
                    |
                    --- skb_gro_receive
                       |          
                       |--98.13%-- tcp_gro_receive
                       |          tcp4_gro_receive
                       |          inet_gro_receive
                       |          dev_gro_receive
                       |          napi_gro_frags
                       |          mlx4_en_process_rx_cq
                       |          mlx4_en_poll_rx_cq
                       |          net_rx_action
                       |          __do_softirq

Testcase:

The fix was tested using mlx4 10Gb/sec network devices between two arm64
systems using "iperf -s" on one end and "iperf -c" on the other.  The
unmodified kernel reported approximately 7.5 Gb/sec throughput, the
fixed kernel approximately 8.5 Gb/sec.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1344323

Title:
  Trusty kernel network performance regression

Status in “linux” package in Ubuntu:
  New

Bug description:
  SRU Justification:

  Impact:

  Reduced TCP/IP receive performance for network devices that do not
  split packet headers into skb linear area (e.g., mlx4).  The trusty
  kernel has incorporated

  commit eff44f9cc9a02aad53d568d3ae5020b6792ae4f6
  Author: Jerry Chu <hkchu@xxxxxxxxxx>
  Date:   Wed Dec 11 20:53:45 2013 -0800

      net-gro: Prepare GRO stack for the upcoming tunneling support

  which modifies the GRO frag0 optimization, but unfortunately for some
  cases results in calls to __skb_pull_tail for every packet being
  received via the GRO path.  This causes a reduction in TCP receive
  performance (or, more accurately, an increase in CPU load for TCP
  receive processing, which will cause throughput reduction for CPU
  limited workloads).

  Fix:

  This has already been fixed in mainline in

  commit a50e233c50dbc881abaa0e4070789064e8d12d70
  Author: Eric Dumazet <edumazet@xxxxxxxxxx>
  Date:   Sat Mar 29 21:28:21 2014 -0700

      net-gro: restore frag0 optimization

  The fix has been backported to and verified on the trusty kernel using
  mlx4 devices and iperf; an increase from 7.5 to 8.5 Gb/sec was
  observed when adding the patch, and the relevant portion of perf
  captures show changes in the call paths from:

       7.17%            iperf  [kernel.kallsyms]   [k] __pskb_pull_tail                       
                        |
                        --- __pskb_pull_tail
                           |          
                           |--48.03%-- tcp_gro_receive
                           |          tcp4_gro_receive
                           |          inet_gro_receive
                           |          dev_gro_receive
                           |          napi_gro_frags
                           |          mlx4_en_process_rx_cq
                           |          mlx4_en_poll_rx_cq
                           |          net_rx_action
                           |          __do_softirq
  [...]
                           |--28.53%-- napi_gro_frags
                           |          mlx4_en_process_rx_cq
                           |          mlx4_en_poll_rx_cq
                           |          net_rx_action
                           |          __do_softirq
  [...]
                           |--13.11%-- inet_gro_receive
                           |          dev_gro_receive
                           |          napi_gro_frags
                           |          mlx4_en_process_rx_cq
                           |          mlx4_en_poll_rx_cq
                           |          net_rx_action
                           |          __do_softirq

  to:

       4.87%          iperf  [kernel.kallsyms]   [k] skb_gro_receive                        
                      |
                      --- skb_gro_receive
                         |          
                         |--98.13%-- tcp_gro_receive
                         |          tcp4_gro_receive
                         |          inet_gro_receive
                         |          dev_gro_receive
                         |          napi_gro_frags
                         |          mlx4_en_process_rx_cq
                         |          mlx4_en_poll_rx_cq
                         |          net_rx_action
                         |          __do_softirq

  Testcase:

  The fix was tested using mlx4 10Gb/sec network devices between two
  arm64 systems using "iperf -s" on one end and "iperf -c" on the other.
  The unmodified kernel reported approximately 7.5 Gb/sec throughput,
  the fixed kernel approximately 8.5 Gb/sec.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1344323/+subscriptions


Follow ups

References