yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #85969
[Bug 1815989] Re: OVS drops RARP packets by QEMU upon live-migration causes up to 40s ping pause in Rocky
Reviewed: https://review.opendev.org/c/openstack/nova/+/602432
Committed: https://opendev.org/openstack/nova/commit/a62dd42c0dbb6b2ab128e558e127d76962738446
Submitter: "Zuul (22348)"
Branch: master
commit a62dd42c0dbb6b2ab128e558e127d76962738446
Author: Stephen Finucane <stephenfin@xxxxxxxxxx>
Date: Fri Apr 30 12:51:35 2021 +0100
libvirt: Delegate OVS plug to os-vif
os-vif 1.15.0 added the ability to create an OVS port during plugging
by specifying the 'create_port' attribute in the 'port_profile' field.
By delegating port creation to os-vif, we can rely on it's 'isolate_vif'
config option [1] that will temporarily configure the VLAN to 4095
(0xfff), which is reserved for implementation use [2] and is used by
neutron to as a dead VLAN [3]. By doing this, we ensure VIFs are plugged
securely, preventing guests from accessing other tenants' networks
before the neutron OVS agent can wire up the port.
This change requires a little dance as part of the live migration flow.
Since we can't be certain the destination host has a version of os-vif
that supports this feature, we need to use a sentinel to indicate when
it does. Typically we would do so with a field in
'LibvirtLiveMigrateData', such as the 'src_supports_numa_live_migration'
and 'dst_supports_numa_live_migration' fields used to indicate support
for NUMA-aware live migration. However, doing this prevents us
backporting this important fix since o.vo changes are not backportable.
Instead, we (somewhat evilly) rely on the free-form nature of the
'VIFMigrateData.profile_json' string field, which stores JSON blobs and
is included in 'LibvirtLiveMigrateData' via the 'vifs' attribute, to
transport this sentinel. This is a hack but is necessary to work around
the lack of a free-form "capabilities" style dict that would allow us do
backportable fixes to live migration features.
Note that this change has the knock on effect of modifying the XML
generated for OVS ports: when hybrid plug is false will now be of type
'ethernet' rather than 'bridge' as before. This explains the larger than
expected test damage but should not affect users.
[1] https://opendev.org/openstack/os-vif/src/tag/2.4.0/vif_plug_ovs/ovs.py#L90-L93
[2] https://en.wikipedia.org/wiki/IEEE_802.1Q#Frame_format
[3] https://answers.launchpad.net/neutron/+question/231806
Change-Id: I11fb5d3ada7f27b39c183157ea73c8b72b4e672e
Depends-On: Id12486b3127ab4ac8ad9ef2b3641da1b79a25a50
Closes-Bug: #1734320
Closes-Bug: #1815989
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1815989
Title:
OVS drops RARP packets by QEMU upon live-migration causes up to 40s
ping pause in Rocky
Status in neutron:
In Progress
Status in OpenStack Compute (nova):
Fix Released
Status in os-vif:
Invalid
Bug description:
This issue is well known, and there were previous attempts to fix it,
like this one
https://bugs.launchpad.net/neutron/+bug/1414559
This issue still exists in Rocky and gets worse. In Rocky, nova compute, nova libvirt and neutron ovs agent all run inside containers.
So far the only simply fix I have is to increase the number of RARP
packets QEMU sends after live-migration from 5 to 10. To be complete,
the nova change (not merged) proposed in the above mentioned activity
does not work.
I am creating this ticket hoping to get an up-to-date (for Rockey and
onwards) expert advise on how to fix in nova-neutron.
For the record, below are the time stamps in my test between neutron ovs agent "activating" the VM port and rarp packets seen by tcpdump on the compute. 10 RARP packets are sent by (recompiled) QEMU, 7 are seen by tcpdump, the 2nd last packet barely made through.
openvswitch-agent.log:
2019-02-14 19:00:13.568 73453 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
[req-26129036-b514-4fa0-a39f-a6b21de17bb9 - - - - -] Port
57d0c265-d971-404d-922d-963c8263e6eb updated. Details: {'profile': {},
'network_qos_policy_id': None, 'qos_policy_id': None,
'allowed_address_pairs': [], 'admin_state_up': True, 'network_id':
'1bf4b8e0-9299-485b-80b0-52e18e7b9b42', 'segmentation_id': 648,
'fixed_ips': [
{'subnet_id': 'b7c09e83-f16f-4d4e-a31a-e33a922c0bac', 'ip_address': '10.0.1.4'}
], 'device_owner': u'compute:nova', 'physical_network': u'physnet0', 'mac_address': 'fa:16:3e:de:af:47', 'device': u'57d0c265-d971-404d-922d-963c8263e6eb', 'port_security_enabled': True, 'port_id': '57d0c265-d971-404d-922d-963c8263e6eb', 'network_type': u'vlan', 'security_groups': [u'5f2175d7-c2c1-49fd-9d05-3a8de3846b9c']}
2019-02-14 19:00:13.568 73453 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-26129036-b514-4fa0-a39f-a6b21de17bb9 - - - - -] Assigning 4 as local vlan for net-id=1bf4b8e0-9299-485b-80b0-52e18e7b9b42
tcpdump for rarp packets:
[root@overcloud-ovscompute-overcloud-0 nova]# tcpdump -i any rarp -nev
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
19:00:10.788220 B fa:16:3e:de:af:47 ethertype Reverse ARP (0x8035), length 62: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is fa:16:3e:de:af:47 tell fa:16:3e:de:af:47, length 46
19:00:11.138216 B fa:16:3e:de:af:47 ethertype Reverse ARP (0x8035), length 62: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is fa:16:3e:de:af:47 tell fa:16:3e:de:af:47, length 46
19:00:11.588216 B fa:16:3e:de:af:47 ethertype Reverse ARP (0x8035), length 62: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is fa:16:3e:de:af:47 tell fa:16:3e:de:af:47, length 46
19:00:12.138217 B fa:16:3e:de:af:47 ethertype Reverse ARP (0x8035), length 62: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is fa:16:3e:de:af:47 tell fa:16:3e:de:af:47, length 46
19:00:12.788216 B fa:16:3e:de:af:47 ethertype Reverse ARP (0x8035), length 62: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is fa:16:3e:de:af:47 tell fa:16:3e:de:af:47, length 46
19:00:13.538216 B fa:16:3e:de:af:47 ethertype Reverse ARP (0x8035), length 62: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is fa:16:3e:de:af:47 tell fa:16:3e:de:af:47, length 46
19:00:14.388320 B fa:16:3e:de:af:47 ethertype Reverse ARP (0x8035), length 62: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is fa:16:3e:de:af:47 tell fa:16:3e:de:af:47, length 46
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1815989/+subscriptions
References