← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1815989] Re: OVS drops RARP packets by QEMU upon live-migration causes up to 40s ping pause in Rocky

 

** Changed in: nova/train
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1815989

Title:
  OVS drops RARP packets by QEMU upon live-migration causes up to 40s
  ping pause in Rocky

Status in neutron:
  In Progress
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  New
Status in OpenStack Compute (nova) victoria series:
  New
Status in OpenStack Compute (nova) wallaby series:
  New
Status in os-vif:
  Invalid

Bug description:
  This issue is well known, and there were previous attempts to fix it,
  like this one

  https://bugs.launchpad.net/neutron/+bug/1414559

  
  This issue still exists in Rocky and gets worse. In Rocky, nova compute, nova libvirt and neutron ovs agent all run inside containers.

  So far the only simply fix I have is to increase the number of RARP
  packets QEMU sends after live-migration from 5 to 10. To be complete,
  the nova change (not merged) proposed in the above mentioned activity
  does not work.

  I am creating this ticket hoping to get an up-to-date (for Rockey and
  onwards) expert advise on how to fix in nova-neutron.

  
  For the record, below are the time stamps in my test between neutron ovs agent "activating" the VM port and rarp packets seen by tcpdump on the compute. 10 RARP packets are sent by (recompiled) QEMU, 7 are seen by tcpdump, the 2nd last packet barely made through.

  openvswitch-agent.log:

  2019-02-14 19:00:13.568 73453 INFO
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
  [req-26129036-b514-4fa0-a39f-a6b21de17bb9 - - - - -] Port
  57d0c265-d971-404d-922d-963c8263e6eb updated. Details: {'profile': {},
  'network_qos_policy_id': None, 'qos_policy_id': None,
  'allowed_address_pairs': [], 'admin_state_up': True, 'network_id':
  '1bf4b8e0-9299-485b-80b0-52e18e7b9b42', 'segmentation_id': 648,
  'fixed_ips': [

  {'subnet_id': 'b7c09e83-f16f-4d4e-a31a-e33a922c0bac', 'ip_address': '10.0.1.4'}
  ], 'device_owner': u'compute:nova', 'physical_network': u'physnet0', 'mac_address': 'fa:16:3e:de:af:47', 'device': u'57d0c265-d971-404d-922d-963c8263e6eb', 'port_security_enabled': True, 'port_id': '57d0c265-d971-404d-922d-963c8263e6eb', 'network_type': u'vlan', 'security_groups': [u'5f2175d7-c2c1-49fd-9d05-3a8de3846b9c']}
  2019-02-14 19:00:13.568 73453 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-26129036-b514-4fa0-a39f-a6b21de17bb9 - - - - -] Assigning 4 as local vlan for net-id=1bf4b8e0-9299-485b-80b0-52e18e7b9b42

   
  tcpdump for rarp packets:

  [root@overcloud-ovscompute-overcloud-0 nova]# tcpdump -i any rarp -nev
  tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes

  19:00:10.788220 B fa:16:3e:de:af:47 ethertype Reverse ARP (0x8035), length 62: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is fa:16:3e:de:af:47 tell fa:16:3e:de:af:47, length 46
  19:00:11.138216 B fa:16:3e:de:af:47 ethertype Reverse ARP (0x8035), length 62: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is fa:16:3e:de:af:47 tell fa:16:3e:de:af:47, length 46
  19:00:11.588216 B fa:16:3e:de:af:47 ethertype Reverse ARP (0x8035), length 62: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is fa:16:3e:de:af:47 tell fa:16:3e:de:af:47, length 46
  19:00:12.138217 B fa:16:3e:de:af:47 ethertype Reverse ARP (0x8035), length 62: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is fa:16:3e:de:af:47 tell fa:16:3e:de:af:47, length 46
  19:00:12.788216 B fa:16:3e:de:af:47 ethertype Reverse ARP (0x8035), length 62: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is fa:16:3e:de:af:47 tell fa:16:3e:de:af:47, length 46
  19:00:13.538216 B fa:16:3e:de:af:47 ethertype Reverse ARP (0x8035), length 62: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is fa:16:3e:de:af:47 tell fa:16:3e:de:af:47, length 46
  19:00:14.388320 B fa:16:3e:de:af:47 ethertype Reverse ARP (0x8035), length 62: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is fa:16:3e:de:af:47 tell fa:16:3e:de:af:47, length 46

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1815989/+subscriptions


References