← Back to team overview

kernel-packages team mailing list archive

[Bug 1317811] Re: Dropped packets on EC2, "xen_netfront: xennet: skb rides the rocket: x slots"

 

I believe my test case is flawed, so I cannot verify with certainty if
the issue is fixed or not. This is the same test case as I used before,
for which I posted code in a gist:
https://gist.github.com/stephank/764e3414d57bc3bcb6b3

Here's what I tried:

 - I started two new c3.large machines from ami-69e76c1e (eu-west-1 HVM
64-bit trusty with instance store)

 - I downloaded io.js 1.2.0 on machine A, together with the pub.js and
sub.js scripts from my gist.

 - I installed redis-server on machine B and reconfigured redis to bind
on to the internal IP (in 10.x.x.x)

 - The machines were initially running linux-virtual 3.13.0.45.52. I
reproduced the issue in this setup by running sub.js twice, then pub.js
once on machine A, connecting them to redis on machine B. The 'rides the
rocket' message showed up in the logs, and the subs lost their
connection.

 - I enabled trusty proposed on both machines with a pin, and
selectively upgraded linux-virtual on both machines. Then rebooted on
both. The kernel on both machines is now linux-virtual 3.13.0.46.53.

 - I ran the same test again, sub.js twice, pub.js once on machine A,
connecting to machine B. There were no 'rides the rocket' messages, but
the subs still lose their connections. I sporadically get
'net_ratelimit: x callbacks suppressed', but not on every test run.

 - I disabled scather/gather on both machines, which also dropped their
MTU to 1500, and ran the test again several times. There were no more
'net_ratelimit' messages, but the subs still lose their connections.

 - I installed redis-server on machine A the same way, listening on the
internal IP, and ran the same test on machine A, but this time
connecting to itself on the internal IP. The test now runs indefinitely.
(But this probably doesn't touch the driver.)

So I'm not sure what to take away from this. I suppose I could continue
by trying to fix my test case to run properly without scather/gather,
before again enabling it. Or find a way to trigger it using a different
test, such as with redis-benchmark.

Stefan, is it sufficient verification if your own testing now shows it
fixed?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1317811

Title:
  Dropped packets on EC2, "xen_netfront: xennet: skb rides the rocket: x
  slots"

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Committed
Status in linux source package in Utopic:
  Fix Committed

Bug description:
  Running Ubuntu 14.04 LTS on EC2, we see a lot of the following in the
  kernel log:

      xen_netfront: xennet: skb rides the rocket: 19 slots

  Each of these messages corresponds to a dropped TX packet, and
  eventually causes our application's connections to break and timeout.

  The problem appears when network load increases. We have Node.js
  processes doing pubsub with a Redis server, and these are most visibly
  affected, showing frequent connection loss. The processes talk to each
  other using the private addresses EC2 allocates to the machines.

  Notably, the default MTU on the network interface seems to have gone
  up from 1500 on 13.10, to 9000 in 14.04 LTS. Reducing the MTU back to
  1500 seems to drastically reduce dropped packets. (Can't say for
  certain if it completely eliminates the problem.)

  The machines we run are started from ami-896c96fe.

  ProblemType: Bug
  DistroRelease: Ubuntu 14.04
  Package: linux-image-3.13.0-24-generic 3.13.0-24.46
  ProcVersionSignature: User Name 3.13.0-24.46-generic 3.13.9
  Uname: Linux 3.13.0-24-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 May  9 09:01 seq
   crw-rw---- 1 root audio 116, 33 May  9 09:01 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.14.1-0ubuntu3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory: 'iw'
  Date: Fri May  9 09:11:18 2014
  Ec2AMI: ami-896c96fe
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: eu-west-1c
  Ec2InstanceType: c3.large
  Ec2Kernel: aki-52a34525
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lspci:

  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
  PciMultimedia:

  ProcFB:

  ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-24-generic N/A
   linux-backports-modules-3.13.0-24-generic  N/A
   linux-firmware                             N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  ---
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 May  9 09:54 seq
   crw-rw---- 1 root audio 116, 33 May  9 09:54 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.14.1-0ubuntu3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory
  CurrentDmesg: [   24.724129] init: plymouth-upstart-bridge main process ended, respawning
  DistroRelease: Ubuntu 14.04
  Ec2AMI: ami-896c96fe
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: eu-west-1c
  Ec2InstanceType: c3.large
  Ec2Kernel: aki-52a34525
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory
  Lspci:

  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
  Package: linux (not installed)
  PciMultimedia:

  ProcFB:

  ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0
  ProcVersionSignature: User Name 3.13.0-24.46-generic 3.13.9
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-24-generic N/A
   linux-backports-modules-3.13.0-24-generic  N/A
   linux-firmware                             N/A
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  trusty ec2-images
  Uname: Linux 3.13.0-24-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm audio cdrom dialout dip floppy netdev plugdev sudo video
  _MarkForUpload: True

  break-fix: - 97a6d1bb2b658ac85ed88205ccd1ab809899884d
  break-fix: - 11d3d2a16cc1f05c6ece69a4392e99efb85666a6

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317811/+subscriptions


References