← Back to team overview

kernel-packages team mailing list archive

[Bug 1510213] [NEW] Regression: Stable kernel update to 3.13.0-66 breaks UDP sockets

 

Public bug reported:

I am running the 3.13 series kernel on Ubuntu 14.04 LTS (Trusty Tahr).

A change introduced in version 3.13.0-66.108 of this kernel breaks UDP
sockets under certain circumstances. The effect is that the recvfrom
operation returns with an error, setting errno to EFAULT, even though
the pointers passed to recvfrom are okay.

Using bisection, I could track down this problem to a single change:

2dde51aa53393a531b493e3a8194e4d467e194a3 is the first bad commit
commit 2dde51aa53393a531b493e3a8194e4d467e194a3
Author: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
Date:   Mon Jul 13 20:01:42 2015 +0800

    net: Fix skb csum races when peeking
    
    BugLink: http://bugs.launchpad.net/bugs/1500810
    
    [ Upstream commit 89c22d8c3b278212eef6a8cc66b570bc840a6f5a ]
    
    When we calculate the checksum on the recv path, we store the
    result in the skb as an optimisation in case we need the checksum
    again down the line.
    
    This is in fact bogus for the MSG_PEEK case as this is done without
    any locking.  So multiple threads can peek and then store the result
    to the same skb, potentially resulting in bogus skb states.
    
    This patch fixes this by only storing the result if the skb is not
    shared.  This preserves the optimisations for the few cases where
    it can be done safely due to locking or other reasons, e.g., SIOCINQ.
    
    Signed-off-by: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
    Acked-by: Eric Dumazet <edumazet@xxxxxxxxxx>
    Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
    Signed-off-by: Kamal Mostafa <kamal@xxxxxxxxxxxxx>
    Signed-off-by: Luis Henriques <luis.henriques@xxxxxxxxxxxxx>

:040000 040000 423debc59ddbc7424283e647e609289fd40dc494
2511e80df4c30a7309737f6b3cee0260269a0ef7 M      net

Steps to reproduce the problem: Install freeradius, and have a radius
client connect to the RADIUS server. After a short amount of time,
freeradius spins at 100% CPU, alternating between a select and recvfrom
call. The recvfrom call fails every time with error EFAULT.

As an alternative to freeradius, you can use the following minimal
program that I wrote that also exhibits this problem:

#include <stdio.h>

#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/select.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/ip.h>

int prepare_socket(int port) {
  int sock = socket(AF_INET, SOCK_DGRAM, 0);
  if (sock < 0) {
    printf("Could not create socket.\n");
    return -1;
  }
  int opt = 1;
  if (setsockopt(sock, SOL_IP, IP_PKTINFO, &opt, sizeof(opt)) < 0) {
    printf("setsockopt failed.\n");
    return -1;
  }
  struct sockaddr_in bind_addr;
  bind_addr.sin_family = AF_INET;
  bind_addr.sin_port = htons(port);
  bind_addr.sin_addr.s_addr = INADDR_ANY;
  int rc = bind(sock, (struct sockaddr *) &bind_addr, sizeof(bind_addr));
  if (rc < 0) {
    printf("Could not bind socket.\n");
    return -1;
  }
  return sock;
}

int main(int argc, char **argv) {
  int sock = prepare_socket(1812);
  if (sock < 0) {
    return 1;
  }
  for (;;) {
    unsigned char buffer[4];
    struct sockaddr src;
    socklen_t src_len = sizeof(src);
    ssize_t received_len = recvfrom(sock, buffer, sizeof(buffer), MSG_PEEK, &src, &src_len);
    if (received_len < 0) {
      if (errno == EAGAIN) {
        printf("EAGAIN\n");
        continue;
      }
      printf("recvfrom failed.\n");
      perror(NULL);
      return 1;
    }
    if (received_len == 4) {
      src_len = sizeof(src);
      received_len = recvfrom(sock, buffer, sizeof(buffer), 0, &src, &src_len);
      if (received_len != 4) {
        printf("Strange received length.\n");
        return 1;
      }
    }
  }
  /* Never reached */
  return 0;
}

However, I did not find out how to craft the traffic that triggers the
bug. However, the traffic from a RADIUS client (a WiFi AP in my case)
reliably triggers the bug after a few seconds.

As this is perfectly legal code and the problem only appears with the
change introduced earlier, I think that this is a regression and the
change in question should be removed from the stable kernel tree.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-66-generic 3.13.0-66.108
ProcVersionSignature: Ubuntu 3.13.0-66.108-generic 3.13.11-ckt27
Uname: Linux 3.13.0-66-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116,  1 Oct 25 19:23 seq
 crw-rw---- 1 root audio 116, 33 Oct 25 19:23 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.1-0ubuntu3.16
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
Date: Mon Oct 26 18:58:42 2015
HibernationDevice: RESUME=/dev/mapper/vg0-swap
InstallationDate: Installed on 2015-01-02 (296 days ago)
InstallationMedia: Ubuntu-Server 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.3)
IwConfig:
 eth0      no wireless extensions.
 
 lo        no wireless extensions.
 
 virbr0    no wireless extensions.
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
PciMultimedia:
 
ProcFB: 0 cirrusdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.13.0-66-generic root=/dev/mapper/vg0-root ro
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-66-generic N/A
 linux-backports-modules-3.13.0-66-generic  N/A
 linux-firmware                             1.127.15
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 01/01/2011
dmi.bios.vendor: Bochs
dmi.bios.version: Bochs
dmi.chassis.type: 1
dmi.chassis.vendor: Bochs
dmi.modalias: dmi:bvnBochs:bvrBochs:bd01/01/2011:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-trusty:cvnBochs:ct1:cvr:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-trusty
dmi.sys.vendor: QEMU

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug trusty

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1510213

Title:
  Regression: Stable kernel update to 3.13.0-66 breaks UDP sockets

Status in linux package in Ubuntu:
  New

Bug description:
  I am running the 3.13 series kernel on Ubuntu 14.04 LTS (Trusty Tahr).

  A change introduced in version 3.13.0-66.108 of this kernel breaks UDP
  sockets under certain circumstances. The effect is that the recvfrom
  operation returns with an error, setting errno to EFAULT, even though
  the pointers passed to recvfrom are okay.

  Using bisection, I could track down this problem to a single change:

  2dde51aa53393a531b493e3a8194e4d467e194a3 is the first bad commit
  commit 2dde51aa53393a531b493e3a8194e4d467e194a3
  Author: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
  Date:   Mon Jul 13 20:01:42 2015 +0800

      net: Fix skb csum races when peeking
      
      BugLink: http://bugs.launchpad.net/bugs/1500810
      
      [ Upstream commit 89c22d8c3b278212eef6a8cc66b570bc840a6f5a ]
      
      When we calculate the checksum on the recv path, we store the
      result in the skb as an optimisation in case we need the checksum
      again down the line.
      
      This is in fact bogus for the MSG_PEEK case as this is done without
      any locking.  So multiple threads can peek and then store the result
      to the same skb, potentially resulting in bogus skb states.
      
      This patch fixes this by only storing the result if the skb is not
      shared.  This preserves the optimisations for the few cases where
      it can be done safely due to locking or other reasons, e.g., SIOCINQ.
      
      Signed-off-by: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
      Acked-by: Eric Dumazet <edumazet@xxxxxxxxxx>
      Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
      Signed-off-by: Kamal Mostafa <kamal@xxxxxxxxxxxxx>
      Signed-off-by: Luis Henriques <luis.henriques@xxxxxxxxxxxxx>

  :040000 040000 423debc59ddbc7424283e647e609289fd40dc494
  2511e80df4c30a7309737f6b3cee0260269a0ef7 M      net

  Steps to reproduce the problem: Install freeradius, and have a radius
  client connect to the RADIUS server. After a short amount of time,
  freeradius spins at 100% CPU, alternating between a select and
  recvfrom call. The recvfrom call fails every time with error EFAULT.

  As an alternative to freeradius, you can use the following minimal
  program that I wrote that also exhibits this problem:

  #include <stdio.h>

  #include <errno.h>
  #include <unistd.h>
  #include <fcntl.h>
  #include <sys/select.h>
  #include <sys/types.h>
  #include <sys/socket.h>
  #include <netinet/in.h>
  #include <netinet/ip.h>

  int prepare_socket(int port) {
    int sock = socket(AF_INET, SOCK_DGRAM, 0);
    if (sock < 0) {
      printf("Could not create socket.\n");
      return -1;
    }
    int opt = 1;
    if (setsockopt(sock, SOL_IP, IP_PKTINFO, &opt, sizeof(opt)) < 0) {
      printf("setsockopt failed.\n");
      return -1;
    }
    struct sockaddr_in bind_addr;
    bind_addr.sin_family = AF_INET;
    bind_addr.sin_port = htons(port);
    bind_addr.sin_addr.s_addr = INADDR_ANY;
    int rc = bind(sock, (struct sockaddr *) &bind_addr, sizeof(bind_addr));
    if (rc < 0) {
      printf("Could not bind socket.\n");
      return -1;
    }
    return sock;
  }

  int main(int argc, char **argv) {
    int sock = prepare_socket(1812);
    if (sock < 0) {
      return 1;
    }
    for (;;) {
      unsigned char buffer[4];
      struct sockaddr src;
      socklen_t src_len = sizeof(src);
      ssize_t received_len = recvfrom(sock, buffer, sizeof(buffer), MSG_PEEK, &src, &src_len);
      if (received_len < 0) {
        if (errno == EAGAIN) {
          printf("EAGAIN\n");
          continue;
        }
        printf("recvfrom failed.\n");
        perror(NULL);
        return 1;
      }
      if (received_len == 4) {
        src_len = sizeof(src);
        received_len = recvfrom(sock, buffer, sizeof(buffer), 0, &src, &src_len);
        if (received_len != 4) {
          printf("Strange received length.\n");
          return 1;
        }
      }
    }
    /* Never reached */
    return 0;
  }

  However, I did not find out how to craft the traffic that triggers the
  bug. However, the traffic from a RADIUS client (a WiFi AP in my case)
  reliably triggers the bug after a few seconds.

  As this is perfectly legal code and the problem only appears with the
  change introduced earlier, I think that this is a regression and the
  change in question should be removed from the stable kernel tree.

  ProblemType: Bug
  DistroRelease: Ubuntu 14.04
  Package: linux-image-3.13.0-66-generic 3.13.0-66.108
  ProcVersionSignature: Ubuntu 3.13.0-66.108-generic 3.13.11-ckt27
  Uname: Linux 3.13.0-66-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Oct 25 19:23 seq
   crw-rw---- 1 root audio 116, 33 Oct 25 19:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.14.1-0ubuntu3.16
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory: 'iw'
  Date: Mon Oct 26 18:58:42 2015
  HibernationDevice: RESUME=/dev/mapper/vg0-swap
  InstallationDate: Installed on 2015-01-02 (296 days ago)
  InstallationMedia: Ubuntu-Server 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.3)
  IwConfig:
   eth0      no wireless extensions.
   
   lo        no wireless extensions.
   
   virbr0    no wireless extensions.
  Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
  MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
  PciMultimedia:
   
  ProcFB: 0 cirrusdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.13.0-66-generic root=/dev/mapper/vg0-root ro
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-66-generic N/A
   linux-backports-modules-3.13.0-66-generic  N/A
   linux-firmware                             1.127.15
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 01/01/2011
  dmi.bios.vendor: Bochs
  dmi.bios.version: Bochs
  dmi.chassis.type: 1
  dmi.chassis.vendor: Bochs
  dmi.modalias: dmi:bvnBochs:bvrBochs:bd01/01/2011:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-trusty:cvnBochs:ct1:cvr:
  dmi.product.name: Standard PC (i440FX + PIIX, 1996)
  dmi.product.version: pc-i440fx-trusty
  dmi.sys.vendor: QEMU

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1510213/+subscriptions


Follow ups