← Back to team overview

kernel-packages team mailing list archive

[Bug 1391339] Re: Trusty kernel inbound network performance regression when GRO is enabled

 

It may look like a Xen issue. Though the t1.micro instance where using
some 3.4 versions, too. One had some "kaos" in the version string the
other looked more like to more common variant. If Amazon does not play
tricks and have different patches applied to the same versions of Xen on
different instance types (which I can only hope they don't), then the
issue might be caused by the networking setup in dom0. Different
versions of Xen likely mean different kernels in dom0 (but they could as
well have different kernels in dom0 with the same xen version). And of
course we have no clue whether they use standard bridging or maybe
openvswitch, and if openvswitch whether the use the in-kernel version or
the upstream one...

So it might be a kernel issue but not as you think. The fact that guest
kernels before 3.13 handled GRO=on better is just because they did not
use GRO (even when set on). So the guest kernel did not regress. It just
uncovered a problem that likely existed before. From what we gathered
for now my vague theory would be that something in the host kernel
(counting openvswitch modules of any origin to that) causes GRO turned
on inside a guest to slower than faster (and also with a greater
variance in the average). Some people with better networking skills said
it could happen if skb TRUESIZE goes wrong or a big skb end up with many
small fragments.  If we ignore the case where two instances end up on
the same host, we have a chain of the host receiving the packets through
its NIC, then routing/switching them onto the netback of the guest which
then makes them available to netfornt inside the guest. And right now
that area (host receiving and forwarding) looks suspicious.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1391339

Title:
  Trusty kernel inbound network performance regression when GRO is
  enabled

Status in “linux” package in Ubuntu:
  Confirmed

Bug description:
  After upgrading our EC2 instances from Lucid to Trusty we noticed an
  increase on download times, Lucid instances were able to download
  twice as fast as Trusty. After some investigation and testing older
  kernels (precise, raring and saucy) we confirmed that this only
  happens on trusty kernel or newer since utopic kernel shows the same
  result and disabling gro with `ethtool -K eth0 gro off` seems to fix
  the problem making download speed the same as the Lucid instances
  again.

  The problem is easily reproducible using Apache Bench a couple times
  on files bigger than 100MB on 1Gb network (EC2) using HTTP or HTTPS.

  Following is an example of download throughput with and without gro:

  root@runtime-common.22 ~# ethtool -K eth0 gro off
  root@runtime-common.22 ~# for i in {1..10}; do ab -n 10 $URL | grep "Transfer rate"; done
  Transfer rate:          85183.40 [Kbytes/sec] received
  Transfer rate:          86375.80 [Kbytes/sec] received
  Transfer rate:          94720.24 [Kbytes/sec] received
  Transfer rate:          84783.82 [Kbytes/sec] received
  Transfer rate:          84933.09 [Kbytes/sec] received
  Transfer rate:          84714.04 [Kbytes/sec] received
  Transfer rate:          84795.58 [Kbytes/sec] received
  Transfer rate:          84636.54 [Kbytes/sec] received
  Transfer rate:          84924.26 [Kbytes/sec] received
  Transfer rate:          84994.10 [Kbytes/sec] received
  root@runtime-common.22 ~# ethtool -K eth0 gro on
  root@runtime-common.22 ~# for i in {1..10}; do ab -n 10 $URL | grep "Transfer rate"; done
  Transfer rate:          74193.53 [Kbytes/sec] received
  Transfer rate:          56808.91 [Kbytes/sec] received
  Transfer rate:          56011.58 [Kbytes/sec] received
  Transfer rate:          82227.74 [Kbytes/sec] received
  Transfer rate:          70806.54 [Kbytes/sec] received
  Transfer rate:          72848.10 [Kbytes/sec] received
  Transfer rate:          58451.94 [Kbytes/sec] received
  Transfer rate:          61221.33 [Kbytes/sec] received
  Transfer rate:          58620.21 [Kbytes/sec] received
  Transfer rate:          69950.03 [Kbytes/sec] received
  root@runtime-common.22 ~# 

  Similar results can be observed using iperf and netperf as well.

  Tested kernels: 
  Not affected: 3.8.0-44-generic (precise/raring), 3.11.0-26-generic (saucy)
  Affected: 3.13.0-39-generic (trusty), 3.16.0-24-generic (utopic)

  Let me know if I can provide any other information that might be helpful like perf traces and reports.
  Rodrigo.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1391339/+subscriptions


References