yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1822256] Re: Ip segments lost when restart ovs-agent with openvswitch firewall

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: LIU Yulong <yulong@xxxxxxxxxxxxx>
Date: Thu, 11 Apr 2019 07:01:51 -0000
Reply-to: Bug 1822256 <1822256@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
For the tap-device, the local vlan tag was stripped before send to it. So you cannot see it.
And in your last comment #6, you can see, all your packets do not have any vlan tag.
Here are some example packets which have vlan id 205 captured by tcpdump:
14:53:54.915607 fe:ea:c8:20:fe:d0 > Broadcast, ethertype 802.1Q (0x8100), length 64: vlan 205, p 0, ethertype ARP, Request who-has xxx.xxx.xxx.xxx tell xxx.xxx.xxx.xxx, length 46
14:53:55.153421 fe:ea:c8:20:fe:d0 > Broadcast, ethertype 802.1Q (0x8100), length 64: vlan 205, p 0, ethertype ARP, Request who-has xxx.xxx.xxx.xxx tell xxx.xxx.xxx.xxx, length 46

And if you checked the OF flows, you may see the following 'strip_vlan'
actions:

dl_vlan=34,dl_dst=fa:16:3e:64:9d:57 actions=strip_vlan,output:2235
dl_vlan=34,dl_dst=fa:16:3e:64:9d:57 actions=load:0x8bb->NXM_NX_REG5[],load:0x22->NXM_NX_REG6[],strip_vlan,resubmit(,81)

ovs-ofctl show br-int|grep tapbad2bb56-13
2235(tapbad2bb56-13): addr:fe:16:3e:64:9d:57

For the "TCP Retransmission", the reason can be varied. I can guess even you do not restart ovs-agent, you will see the 
"TCP Retransmission" if the network is congested

So, I don't think this is an issue.


** Changed in: neutron
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1822256

Title:
  Ip segments lost when restart ovs-agent with openvswitch firewall

Status in neutron:
  Invalid

Bug description:
  environment：
  linux version: Linux controller.novalocal 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  OpenStack version: Rocky
  network type: vxlan or vlan
  firewall driver: openvswitch

  1. Create 2 VMs(vm1, vm2) in different compute nodes(node-1, node-2)
  with all tcp passed sg in one network.

  2. Login to vm2, create a large file, for example:
  vm2# dd if=/dev/zero of=/mnt/test.img bs=1G count=5

  3.Login to vm1, scp vm2's large file into vm1, when scp process starts, go to step 4.
  vm1# scp vm2-ip:/mnt/test.img /mnt

  4.Login to node-2, and restart neutron-openvswitch-agent, this will refresh all the openflow in br-int
  node-2# systemctl restart neutron-openvswitch-agent

  5.Login to vm1, and after several seconds, you will find the scp
  process status is stalled.

  After some investigation, I found the openflow refresh causes ip
  segments lost.When this happened, I captured packets with "tcpdump -i
  tap-xxx -w tmp.pcap", and with wireshark I saw these errors:

  192.168.100.19	192.168.100.5	SSH	16478	Server: [TCP ACKed unseen segment] [TCP Previous segment not captured] , Encrypted packet (len=16412)
  192.168.100.19	192.168.100.5	SSH	8302	Server: [TCP ACKed unseen segment] , Encrypted packet (len=8236)
  192.168.100.5	192.168.100.19	TCP	66	[TCP ACKed unseen segment] [TCP Previous segment not captured] 54354 → 22 [ACK] Seq=2509 Ack=600733 Win=16522 Len=0 TSval=2847412 TSecr=2851031
  192.168.100.19	192.168.100.5	SSH	1464	Server: [TCP Spurious Retransmission] , Encrypted packet (len=1398)
  192.168.100.5	192.168.100.19	TCP	78	[TCP Dup ACK 25182#1] 54354 → 22 [ACK] Seq=326305 Ack=67089901 Win=18494 Len=0 TSval=2849742 TSecr=2853310 SLE=67073429 SRE=67074827
  192.168.100.5	192.168.100.19	TCP	110	[TCP Retransmission] 54354 → 22 [PSH, ACK] Seq=326173 Ack=67089901 Win=18494 Len=44 TSval=2849742 TSecr=2853310

  192.168.100.19	192.168.100.5	TCP	1464	[TCP Retransmission] 22 → 54354 [ACK] Seq=70971905 Ack=346105 Win=2016 Len=1398 TSval=2853361 TSecr=2849691
  192.168.100.19	192.168.100.5	TCP	1464	[TCP Retransmission] 22 → 54354 [ACK] Seq=70971905 Ack=346105 Win=2016 Len=1398 TSval=2853463 TSecr=2849691
  192.168.100.19	192.168.100.5	TCP	1464	[TCP Retransmission] 22 → 54354 [ACK] Seq=70971905 Ack=346105 Win=2016 Len=1398 TSval=2854076 TSecr=2849691

  And I checked the statue of this tcp connect in both compute nodes, it's still ESTABLISHED.
  # conntrack -L | grep 192.168.100.5
  tcp      6 299 ESTABLISHED src=192.168.100.5 dst=192.168.100.19 sport=54356 dport=22 src=192.168.100.19 dst=192.168.100.5 sport=22 dport=54354 [ASSURED] mark=0 zone=4 use=1
  #  conntrack -L | grep 192.168.100.5
  tcp      6 287 ESTABLISHED src=192.168.100.5 dst=192.168.100.19 sport=54356 dport=22 src=192.168.100.19 dst=192.168.100.5 sport=22 dport=54354 [ASSURED] mark=0 zone=1 use=1

  I have no idea why refresh openflow will cause ip segments lost, hopes
  someone has a way to solve this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1822256/+subscriptions
References

[Bug 1822256] [NEW] Ip fragments lost when restart ovs-agent with openvswitch firewall
From: Yang Li, 2019-03-29