← Back to team overview

kernel-packages team mailing list archive

[Bug 1521053] Re: Network Performance dropping between vms on different location in Azure

 

Environment
- Ubuntu trusty 14.04.3 (ubuntu-vivid kernel)
- DS2, West Europe <-> North Europe, Azure
- test app : netcat+nload, iperf

Logs
1. ===================================================================================================================
The customer provide us some analysis about kernel version, which is ok, which is not

Works 
ii linux-image-3.16.0-52-generic 3.16.0-52.71~14.04.1 amd64 Linux kernel image for version 3.16.0 on 64 bit x86 SMP 
ii linux-image-3.19.0-18-generic 3.19.0-18.18~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP 
ii linux-image-3.19.0-20-generic 3.19.0-20.20~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP 
ii linux-image-3.19.0-21-generic 3.19.0-21.21~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP 
ii linux-image-3.19.0-22-generic 3.19.0-22.22~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP 
ii linux-image-3.19.0-23-generic 3.19.0-23.24~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP 
ii linux-image-3.19.0-25-generic 3.19.0-25.26~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP 
ii linux-image-3.19.0-26-generic 3.19.0-26.28~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP 

Doesnt work 
ii linux-image-3.19.0-28-generic 3.19.0-28.30~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP 
ii linux-image-3.19.0-30-generic 3.19.0-30.34~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP 
ii linux-image-3.19.0-31-generic 3.19.0-31.36~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP 
ii linux-image-3.19.0-32-generic 3.19.0-32.37~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
======================================================================================================================

2.====================================================================================================================
Fail ( dropping )
----------------------------------------------------------------------------------------------------------------------
after bisecting them,
I found below commit is the one which dropping is started

commit 1826dae15f7b5d4742bd54c0392b2280cad0ef60 
Author: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> 
Date: Mon Apr 13 16:34:35 2015 -0700 

hv_netvsc: Implement partial copy into send buffer

BugLink: http://bugs.launchpad.net/bugs/1454892

If remaining space in a send buffer slot is too small for the whole message, 
we only copy the RNDIS header and PPI data into send buffer, so we can batch 
one more packet each time. It reduces the vmbus per-message overhead. 

Signed-off-by: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> 
Reviewed-by: K. Y. Srinivasan <kys@xxxxxxxxxxxxx> 
Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> 
(cherry picked from commit aa0a34be68290aa9aa071c0691fb8b6edda38358) 
Signed-off-by: Joseph Salisbury <joseph.salisbury@xxxxxxxxxxxxx> 
Acked-by: Tim Gardner <tim.gardner@xxxxxxxxxxxxx> 
Acked-by: Brad Figg <brad.figg@xxxxxxxxxxxxx> 
Signed-off-by: Brad Figg <brad.figg@xxxxxxxxxxxxx> 
=====================================================================================================================

3. ==================================================================================================================
PASS ( no dropping )
---------------------------------------------------------------------------------------------------------------------
I tested upstream checkouted with above commit
=====================================================================================================================

4. ==================================================================================================================
After checking differences between upstream's and ubuntu-vivid's "hv_netvsc: Implement partial copy into send buffer"
found several commits between them

981a1bd85a959bb3b44e07c212ebc61c62ad7cf9 hv_netvsc: use single existing drop path in netvsc_start_xmit
e88f7e078e47d4261a22e6f20a574620cbfc7a4b hv_netvsc: try linearizing big SKBs before dropping them
721514222db13498613706709409c21c105e0f4a hv_netvsc: Define a macro RNDIS_AND_PPI_SIZE
0d158852a8089099a6959ae235b20f230871982f hv_netvsc: Clean up two unused variables
59995370dbca7636c105ddadc0447fab86ad3887 hyperv: Implement netvsc_get_channels() ethool op
5ce58c2f13eaa8ca6d7e1041175433bd8cc55756 hv_netvsc: remove vmbus_are_subchannels_present() in rndis_filter_device_add()
999028cc1ccd1cd3a1c0104c6423553d3f573197 hyperv: match wait_for_completion_timeout return type
=====================================================================================================================

5. ==================================================================================================================
after several days testing, I found one which improves performance

0d158852a8089099a6959ae235b20f230871982f hv_netvsc: Clean up two unused variables
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0d158852a8089099a6959ae235b20f230871982f
=====================================================================================================================

6. ==================================================================================================================
Fail 
---------------------------------------------------------------------------------------------------------------------
split out above commit

remove assignment ( see below )

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index f699236..7e83c6a 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1011,7 +1011,6 @@ static void netvsc_receive(struct netvsc_device *net_device,
 	}
 
 	count = vmxferpage_packet->range_cnt;
-	netvsc_packet->device = device;
 	netvsc_packet->channel = channel;
 
 	/* Each range represents 1 RNDIS pkt that contains 1 ethernet frame */
=====================================================================================================================

7. ==================================================================================================================
Pass
---------------------------------------------------------------------------------------------------------------------
remove header ( of course above should be removed )

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 309adee..95a25e4 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -130,7 +130,6 @@ struct hv_netvsc_packet {
 	u32 status;
 	bool part_of_skb;
 
-	struct hv_device *device;
 	bool is_data_pkt;
 	bool xmit_more; /* from skb */
 	u16 vlan_tci;
=====================================================================================================================

8. ==================================================================================================================
Fail
---------------------------------------------------------------------------------------------------------------------
remove the other part

diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index a160437..0d92efe 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -47,8 +47,6 @@ struct rndis_request {
 
 	/* Simplify allocation by having a netvsc packet inline */
 	struct hv_netvsc_packet	pkt;
-	/* Set 2 pages for rndis requests crossing page boundary */
-	struct hv_page_buffer buf[2];
 
 	struct rndis_message request_msg;
 	/*
=====================================================================================================================

9. ==================================================================================================================
weird, so I put a byte on structure

FAil:
- char a;       to hv_netvsc_packet structure which device variable is removed. (number 6, 7)
- char a[2];
- char a[3];     // 4 is same as pointer so i didn't test
- char a[5];
- char a[32];
Pass:
- char a;       to vmbus_channel structure which is member of hv_netvsc_packet structure
=====================================================================================================================

** Description changed:

  [Impact]
  
- Ubuntu VM in Azure has network performance issue when check by using netcat&nload
- Normal bandwidth is 50MB/s ~ 100MB/s, but it's 0.3MB/s when dropping happens
+ Ubuntu VM in Azure has network performance issue
+ Normal bandwidth is 50MB/s ~ 100MB/s, but it's 0.3MB/s when dropping happens.
  
  [Fix]
  
  Upstream development
  0d158852a8089099a6959ae235b20f230871982f ("hv_netvsc: Clean up two unused variables")
  
- It's affected over 3.19.0-28-generic (vivid)
+ It's affected over 3.19.0-28-generic (ubuntu-vivid)
  
  With this commit, I confirmed that the problem has gone by the testing.
  
+ Test Logs
+ http://pastebin.ubuntu.com/13657083/
+ 
  [Testcase]
  
- This is only for Azure service.
+ Make 2 VMs on North Europe, West Europe each.
+ Then run below test script
  
- Make 2 vms on North Europe,  West Europe and run below test script
+ NE VM
  
- NE
+ - netcat & nload
+  while true; do netcat -l 8080 < /dev/zero; done;
+  nload -u M eth0 ( need nload pkg )
  
-  while true; do netcat -l 8080 < /dev/zero; done;
+ - iperf
+  iperf -s -f M
  
-  nload -u M eth0 ( need nload pkg )
+ WE VM
  
- WE
+ - netcat
+  for i in {1..1000}
+  do
+   timeout 30s nc NE_HOST 8080 > /dev/null
+  done
  
-  for i in {1..1000} 
-  do 
-   timeout 30s nc NE_HOST 8080 > /dev/null 
-  done
+ - iperf
+  iperf -c HOST -f M
  
- Network performance dropping can be seen frequently
+ Network performance dropping can be seen frequently in nload graph.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1521053

Title:
  Network Performance dropping between vms on different location in
  Azure

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Vivid:
  Confirmed

Bug description:
  [Impact]

  Ubuntu VM in Azure has network performance issue
  Normal bandwidth is 50MB/s ~ 100MB/s, but it's 0.3MB/s when dropping happens.

  [Fix]

  Upstream development
  0d158852a8089099a6959ae235b20f230871982f ("hv_netvsc: Clean up two unused variables")

  It's affected over 3.19.0-28-generic (ubuntu-vivid)

  With this commit, I confirmed that the problem has gone by the
  testing.

  Test Logs
  http://pastebin.ubuntu.com/13657083/

  [Testcase]

  Make 2 VMs on North Europe, West Europe each.
  Then run below test script

  NE VM

  - netcat & nload
   while true; do netcat -l 8080 < /dev/zero; done;
   nload -u M eth0 ( need nload pkg )

  - iperf
   iperf -s -f M

  WE VM

  - netcat
   for i in {1..1000}
   do
    timeout 30s nc NE_HOST 8080 > /dev/null
   done

  - iperf
   iperf -c HOST -f M

  Network performance dropping can be seen frequently in nload graph.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1521053/+subscriptions


References