← Back to team overview

kernel-packages team mailing list archive

[Bug 1508706] [NEW] Networking hangs on azure using hv_netvsc; bisected

 

Public bug reported:


Running Ubuntu instances on azure, testing basic networking between two instances.  This involves configuring VXLAN between the two instances and running iperf and rsync of the kernel tree between the instances, e.g.,

ip link add vxlan0 type vxlan id 999 local 10.88.0.12 remote 10.88.0.11 dev eth0
ip l set vxlan0 up
ip addr add 242.0.0.12/8 dev vxlan0

After some time (sometimes instantly, sometimes up to 30 minutes of
activity), the networking will hang.  This hang takes two forms:  a
complete loss of connectivity (all network, even the ssh session used to
log in), or just a loss of connectivity between instances (the ssh
session remains active).  Sometimes for the latter case, the ssh session
will then later hang.

This first appeared when testing with the Ubuntu 3.19 kernel, and I
subsequently bisected this to:

commit effa2012d207f78cbc5a8360e62d420a8860b7e9
Author: KY Srinivasan <kys@xxxxxxxxxxxxx>
Date:   Mon May 11 15:39:46 2015 -0700

    hv_netvsc: Use the xmit_more skb flag to optimize signaling the host

    BugLink: http://bugs.launchpad.net/bugs/1454892

    Based on the information given to this driver (via the xmit_more skb flag),
    we can defer signaling the host if more packets are on the way. This will help
    make the host more efficient since it can potentially process a larger batch of
    packets. Implement this optimization.

    Signed-off-by: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
    Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
    Acked-by: Tim Gardner <tim.gardner@xxxxxxxxxxxxx>
    Acked-by: Brad Figg <brad.figg@xxxxxxxxxxxxx>
    Signed-off-by: Brad Figg <brad.figg@xxxxxxxxxxxxx>

I also tested the mainline kernel (net-next); it fails with the
equivalent commit:

commit 82fa3c776e5abba7ed6e4b4f4983d14731c37d6a
Author: KY Srinivasan <kys@xxxxxxxxxxxxx>
Date:   Mon May 11 15:39:46 2015 -0700

    hv_netvsc: Use the xmit_more skb flag to optimize signaling the host

For both kernel trees, I also tested the prior commit and it did not
exhibit the failure after many hours.  For ubuntu, this was

commit a4aeb290bd75af5e16a6144a418291476ac6140c
Author: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
Date:   Wed Mar 18 12:29:29 2015 -0700

    Drivers: hv: vmbus: Export the vmbus_sendpacket_pagebuffer_ctl()

and for mainline it was

commit 9eea92226407e7a117ef1ceef45380ebd000a0e2
Author: Alexei Starovoitov <ast@xxxxxxxxxxxx>
Date:   Mon May 11 15:19:48 2015 -0700

    pktgen: fix packet generation

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1508706

Title:
  Networking hangs on azure using hv_netvsc; bisected

Status in linux package in Ubuntu:
  New

Bug description:
  
  Running Ubuntu instances on azure, testing basic networking between two instances.  This involves configuring VXLAN between the two instances and running iperf and rsync of the kernel tree between the instances, e.g.,

  ip link add vxlan0 type vxlan id 999 local 10.88.0.12 remote 10.88.0.11 dev eth0
  ip l set vxlan0 up
  ip addr add 242.0.0.12/8 dev vxlan0

  After some time (sometimes instantly, sometimes up to 30 minutes of
  activity), the networking will hang.  This hang takes two forms:  a
  complete loss of connectivity (all network, even the ssh session used
  to log in), or just a loss of connectivity between instances (the ssh
  session remains active).  Sometimes for the latter case, the ssh
  session will then later hang.

  This first appeared when testing with the Ubuntu 3.19 kernel, and I
  subsequently bisected this to:

  commit effa2012d207f78cbc5a8360e62d420a8860b7e9
  Author: KY Srinivasan <kys@xxxxxxxxxxxxx>
  Date:   Mon May 11 15:39:46 2015 -0700

      hv_netvsc: Use the xmit_more skb flag to optimize signaling the
  host

      BugLink: http://bugs.launchpad.net/bugs/1454892

      Based on the information given to this driver (via the xmit_more skb flag),
      we can defer signaling the host if more packets are on the way. This will help
      make the host more efficient since it can potentially process a larger batch of
      packets. Implement this optimization.

      Signed-off-by: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
      Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
      Acked-by: Tim Gardner <tim.gardner@xxxxxxxxxxxxx>
      Acked-by: Brad Figg <brad.figg@xxxxxxxxxxxxx>
      Signed-off-by: Brad Figg <brad.figg@xxxxxxxxxxxxx>

  I also tested the mainline kernel (net-next); it fails with the
  equivalent commit:

  commit 82fa3c776e5abba7ed6e4b4f4983d14731c37d6a
  Author: KY Srinivasan <kys@xxxxxxxxxxxxx>
  Date:   Mon May 11 15:39:46 2015 -0700

      hv_netvsc: Use the xmit_more skb flag to optimize signaling the
  host

  For both kernel trees, I also tested the prior commit and it did not
  exhibit the failure after many hours.  For ubuntu, this was

  commit a4aeb290bd75af5e16a6144a418291476ac6140c
  Author: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
  Date:   Wed Mar 18 12:29:29 2015 -0700

      Drivers: hv: vmbus: Export the vmbus_sendpacket_pagebuffer_ctl()

  and for mainline it was

  commit 9eea92226407e7a117ef1ceef45380ebd000a0e2
  Author: Alexei Starovoitov <ast@xxxxxxxxxxxx>
  Date:   Mon May 11 15:19:48 2015 -0700

      pktgen: fix packet generation

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1508706/+subscriptions


Follow ups