yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #63143
[Bug 1681979] [NEW] L2pop flows are lost after OVS agent restart
Public bug reported:
In OVS agent, there is a race condition between l2pop's add_fdb_entries
notification and provision_local_vlan when we create a vlanmanager
mapping. This results in either unicast, flooding, or both entries not
being populated on the host. Without the flooding entries, connectivity
is lost.
They are lost semi-permanently after this as l2Pop mechanism driver only
sends full list of fdb entries after a port_update_up, but only on 1st
agent port, or after OVS reboot (where we again hit same race condition,
or it partially fixed flows).
Legacy testbed w/ 3 nodes. 4 tenant networks:
1. The add_fdb_entries code path will create the tunnel port(s) in
add_fdb_tun, then invoke add_fdb_flow to add the BC/UC l2pop flows and -
but only if it can get a Vlanmanager mapping:
def fdb_add(self, context, fdb_entries):
LOG.info("2. fdb_add received")
for lvm, agent_ports in self.get_agent_ports(fdb_entries):
agent_ports.pop(self.local_ip, None)
LOG.info("2. fdb_add: agent_ports = %s", agent_ports)
LOG.info("2. fdb_add: lvm = %s", lvm)
if len(agent_ports):
if not self.enable_distributed_routing:
with self.tun_br.deferred() as deferred_br:
LOG.info("2. fdb_add: about to call fdb_add_tun w/ lvm = %s", lvm)
self.fdb_add_tun(context, deferred_br, lvm,
agent_ports, self._tunnel_port_lookup)
else:
self.fdb_add_tun(context, self.tun_br, lvm,
agent_ports, self._tunnel_port_lookup)
def get_agent_ports(self, fdb_entries, local_vlan_map=None):
"""Generator to yield port info.
For each known (i.e found in VLAN manager) network in
fdb_entries, yield (lvm, fdb_entries[network_id]['ports']) pair.
:param fdb_entries: l2pop fdb entries
:param local_vlan_map: Deprecated.
"""
lvm_getter = self._get_lvm_getter(local_vlan_map)
for network_id, values in fdb_entries.items():
try:
lvm = lvm_getter(network_id, local_vlan_map)
except vlanmanager.MappingNotFound:
LOG.info("get_agent_ports: vlanmanager.MappingNotFound EXCEPTION! netid = %s, local_vlan_map = %s", network_id, local_vlan_map)
continue
agent_ports = values.get('ports')
LOG.info("get_agent_ports: got lvm= %s", lvm)
yield (lvm, agent_ports)
2. If the vlan mapping isn't found, the tunnel port creation is skipped, as are flows.
3. When we create VLAN mapping in provision_local_vlan(), the
install_flood_to_tun however is skipped if there are currently no tunnel
ports created:
def provision_local_vlan(self, net_uuid, network_type, physical_network,
segmentation_id):
...
if network_type in constants.TUNNEL_NETWORK_TYPES:
LOG.info("ARJUN: network_type = %s", network_type)
if self.enable_tunneling:
# outbound broadcast/multicast
ofports = list(self.tun_br_ofports[network_type].values())
LOG.info("ARJUN: provision_local_vlan: ofports = %s enable_tunneling = %s", ofports, self.enable_tunneling)
if ofports:
LOG.info("ARJUN: installing FLOODING_ENTRY: lvid = %s segment_id = %s", lvid, segmentation_id)
self.tun_br.install_flood_to_tun(lvid,
segmentation_id,
ofports)
# inbound from tunnels: set lvid in the right table
# and resubmit to Table LEARN_FROM_TUN for mac learning
4. Finally, the cleanup stale flows logic removes all old flows. At this
point br-tun is left with missing flooding and/or unicast flows.
5. If #3 always happens first for all networks, we are good. Otherwise flows are lost:
Unicast only flows missing if (but flood added):
- Network Vlanmanager mapping is allocated *after* it's
add_fdb_entries, but some other network sets up tunnel ports on br-tun
Broadcast AND UC flows missing if:
- A network tries to add fdb flows before vlanmanager allocated, and no
other network has created the tunnel ports/ofports on br-tun yet.
Example with 3 tenant networks:
1. add_fdb_entries for network 1 and 2 - no LVM yet, so flow and tunnel ports not created yet
2. LVM created for network 2, but flood not installed because no ofports
3. LVM created for networks 3
4. add_fdb_entries for network 3, here it properly finds the LVM, and creates tunnel ports/flows
5. LVM created for network 1, tunnel ofports created, so flood installed - but unicast missing
After this point, network 3 would be fine, network 2 would me missing
all flows, network 1 would have flood but not unicast.
The ordering seems to vary wildly depending on # of tunnel ports, # of
networks, ports per network, how ports are distributed, network speed,
etc...
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1681979
Title:
L2pop flows are lost after OVS agent restart
Status in neutron:
New
Bug description:
In OVS agent, there is a race condition between l2pop's
add_fdb_entries notification and provision_local_vlan when we create a
vlanmanager mapping. This results in either unicast, flooding, or both
entries not being populated on the host. Without the flooding entries,
connectivity is lost.
They are lost semi-permanently after this as l2Pop mechanism driver
only sends full list of fdb entries after a port_update_up, but only
on 1st agent port, or after OVS reboot (where we again hit same race
condition, or it partially fixed flows).
Legacy testbed w/ 3 nodes. 4 tenant networks:
1. The add_fdb_entries code path will create the tunnel port(s) in
add_fdb_tun, then invoke add_fdb_flow to add the BC/UC l2pop flows and
- but only if it can get a Vlanmanager mapping:
def fdb_add(self, context, fdb_entries):
LOG.info("2. fdb_add received")
for lvm, agent_ports in self.get_agent_ports(fdb_entries):
agent_ports.pop(self.local_ip, None)
LOG.info("2. fdb_add: agent_ports = %s", agent_ports)
LOG.info("2. fdb_add: lvm = %s", lvm)
if len(agent_ports):
if not self.enable_distributed_routing:
with self.tun_br.deferred() as deferred_br:
LOG.info("2. fdb_add: about to call fdb_add_tun w/ lvm = %s", lvm)
self.fdb_add_tun(context, deferred_br, lvm,
agent_ports, self._tunnel_port_lookup)
else:
self.fdb_add_tun(context, self.tun_br, lvm,
agent_ports, self._tunnel_port_lookup)
def get_agent_ports(self, fdb_entries, local_vlan_map=None):
"""Generator to yield port info.
For each known (i.e found in VLAN manager) network in
fdb_entries, yield (lvm, fdb_entries[network_id]['ports']) pair.
:param fdb_entries: l2pop fdb entries
:param local_vlan_map: Deprecated.
"""
lvm_getter = self._get_lvm_getter(local_vlan_map)
for network_id, values in fdb_entries.items():
try:
lvm = lvm_getter(network_id, local_vlan_map)
except vlanmanager.MappingNotFound:
LOG.info("get_agent_ports: vlanmanager.MappingNotFound EXCEPTION! netid = %s, local_vlan_map = %s", network_id, local_vlan_map)
continue
agent_ports = values.get('ports')
LOG.info("get_agent_ports: got lvm= %s", lvm)
yield (lvm, agent_ports)
2. If the vlan mapping isn't found, the tunnel port creation is skipped, as are flows.
3. When we create VLAN mapping in provision_local_vlan(), the
install_flood_to_tun however is skipped if there are currently no
tunnel ports created:
def provision_local_vlan(self, net_uuid, network_type, physical_network,
segmentation_id):
...
if network_type in constants.TUNNEL_NETWORK_TYPES:
LOG.info("ARJUN: network_type = %s", network_type)
if self.enable_tunneling:
# outbound broadcast/multicast
ofports = list(self.tun_br_ofports[network_type].values())
LOG.info("ARJUN: provision_local_vlan: ofports = %s enable_tunneling = %s", ofports, self.enable_tunneling)
if ofports:
LOG.info("ARJUN: installing FLOODING_ENTRY: lvid = %s segment_id = %s", lvid, segmentation_id)
self.tun_br.install_flood_to_tun(lvid,
segmentation_id,
ofports)
# inbound from tunnels: set lvid in the right table
# and resubmit to Table LEARN_FROM_TUN for mac learning
4. Finally, the cleanup stale flows logic removes all old flows. At
this point br-tun is left with missing flooding and/or unicast flows.
5. If #3 always happens first for all networks, we are good. Otherwise flows are lost:
Unicast only flows missing if (but flood added):
- Network Vlanmanager mapping is allocated *after* it's
add_fdb_entries, but some other network sets up tunnel ports on br-tun
Broadcast AND UC flows missing if:
- A network tries to add fdb flows before vlanmanager allocated, and
no other network has created the tunnel ports/ofports on br-tun yet.
Example with 3 tenant networks:
1. add_fdb_entries for network 1 and 2 - no LVM yet, so flow and tunnel ports not created yet
2. LVM created for network 2, but flood not installed because no ofports
3. LVM created for networks 3
4. add_fdb_entries for network 3, here it properly finds the LVM, and creates tunnel ports/flows
5. LVM created for network 1, tunnel ofports created, so flood installed - but unicast missing
After this point, network 3 would be fine, network 2 would me missing
all flows, network 1 would have flood but not unicast.
The ordering seems to vary wildly depending on # of tunnel ports, # of
networks, ports per network, how ports are distributed, network speed,
etc...
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1681979/+subscriptions