← Back to team overview

kernel-packages team mailing list archive

[Bug 1317811] Re: Dropped packets on EC2, "xen_netfront: xennet: skb rides the rocket: x slots"

 

I can't comment on the driver implementation details, but I can give
some further details about our experience.

The app in question was a second screen app for the dutch public
broadcasting network for the Eurovision Song Contest. The app was live
for two semi-finals on tuesday the 6th and thursday the 8th, as well as
the finals saturday the 10th. Load was lowest on the thursday, when the
Netherlands did not perform, and highest saturday during the finals. We
ran c3.large instances for all shows.

During the first run on tuesday was when we first noticed the issue.

Shortly before the second run on thursday we noticed the high MTU
setting as a possible cause, and changed it to 1500 on half of our
machines in the redundant setup. There was a clear difference in
connection stability between these machines.

For the third run on saturday, we had all machines on the normal MTU of
1500, as we adjusted our startup scripts to force the setting. We had
zero connection issues that night, and clean kernel logs, even though
this night saw the highest network load of all three.

We have several m1.small instances running 24/7 as well, and these have
clean kernel logs, but their network load is quite low. The MTU on these
has always been untouched, and is a normal 1500, apparently by default.

In the instance type list, EC2 shows Compute Optimized instances as
having Enhanced Networking. Even though we don't qualify for it, perhaps
the networking setup is different for these instances.
https://aws.amazon.com/ec2/instance-types/

About a custom kernel, we'd have to look into deploying it, or
reproducing the issue on a smaller test setup. I'd prefer looking into
the latter, because maybe we can reproduce it between just two instances
with stress tools.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1317811

Title:
  Dropped packets on EC2, "xen_netfront: xennet: skb rides the rocket: x
  slots"

Status in “linux” package in Ubuntu:
  Confirmed

Bug description:
  Running Ubuntu 14.04 LTS on EC2, we see a lot of the following in the
  kernel log:

      xen_netfront: xennet: skb rides the rocket: 19 slots

  Each of these messages corresponds to a dropped TX packet, and
  eventually causes our application's connections to break and timeout.

  The problem appears when network load increases. We have Node.js
  processes doing pubsub with a Redis server, and these are most visibly
  affected, showing frequent connection loss. The processes talk to each
  other using the private addresses EC2 allocates to the machines.

  Notably, the default MTU on the network interface seems to have gone
  up from 1500 on 13.10, to 9000 in 14.04 LTS. Reducing the MTU back to
  1500 seems to drastically reduce dropped packets. (Can't say for
  certain if it completely eliminates the problem.)

  The machines we run are started from ami-896c96fe.

  ProblemType: Bug
  DistroRelease: Ubuntu 14.04
  Package: linux-image-3.13.0-24-generic 3.13.0-24.46
  ProcVersionSignature: User Name 3.13.0-24.46-generic 3.13.9
  Uname: Linux 3.13.0-24-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 May  9 09:01 seq
   crw-rw---- 1 root audio 116, 33 May  9 09:01 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.14.1-0ubuntu3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory: 'iw'
  Date: Fri May  9 09:11:18 2014
  Ec2AMI: ami-896c96fe
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: eu-west-1c
  Ec2InstanceType: c3.large
  Ec2Kernel: aki-52a34525
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lspci:
   
  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
  PciMultimedia:
   
  ProcFB:
   
  ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-24-generic N/A
   linux-backports-modules-3.13.0-24-generic  N/A
   linux-firmware                             N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  --- 
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 May  9 09:54 seq
   crw-rw---- 1 root audio 116, 33 May  9 09:54 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.14.1-0ubuntu3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory
  CurrentDmesg: [   24.724129] init: plymouth-upstart-bridge main process ended, respawning
  DistroRelease: Ubuntu 14.04
  Ec2AMI: ami-896c96fe
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: eu-west-1c
  Ec2InstanceType: c3.large
  Ec2Kernel: aki-52a34525
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory
  Lspci:
   
  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
  Package: linux (not installed)
  PciMultimedia:
   
  ProcFB:
   
  ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0
  ProcVersionSignature: User Name 3.13.0-24.46-generic 3.13.9
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-24-generic N/A
   linux-backports-modules-3.13.0-24-generic  N/A
   linux-firmware                             N/A
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  trusty ec2-images
  Uname: Linux 3.13.0-24-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm audio cdrom dialout dip floppy netdev plugdev sudo video
  _MarkForUpload: True

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317811/+subscriptions


References