yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #59980
[Bug 1651672] [NEW] dhcp_ready_on_ports causing race with Neutron OVS agent boot
Public bug reported:
When neutron-openvswitch-agent starts it indirectly causes all its ports
to transition into BUILD state. If DHCP is enabled a DHCP agent receives
a port.update.end notification and refreshes its configuration. After
this a dhcp_ready_on_ports RPC call is made.
In this stage there are no provisioning blocks as we haven't created any
as no new ports are actually created. However, PROVISIONING_COMPLETE
event is still emitted which causes the ports to transition into ACTIVE
state. If l2pop is enabled, fdb entries are sent at this stage.
The problem: with large number of ports, OVS agent is most likely still
processing and allocating local vlans. This causes some (or all) of the
fdb entries to be discarded as there are no local vlans. When the OVS
agent reaches the point where it uses update_device_list RPC call to
transition ports into ACTIVE they are already in that state and no fdb
entries are emitted.
Version: observed in Newton (neutron 9.0.0)
Pre-conditions:
- standalone network node with l3-agent in legacy mode
- dhcp agent running on another node
- ovsdb_interface in vsctl mode (due to performance issues with IDL)
To reproduce:
- have a L3 node with large amount of ports (we had about 1000)
- have a DHCP agent running on some other node
- issue a cold boot on the L3 node (no ports in br-int, no existing flows in br-tun). start ovs agent and l3 agent at the same time
- observe incoming fdb entries before ports are actually provisioned
Expected behaviour: dhcp agent should not cause these ports to
transition into ACTIVE. fdb entries should be emitted only when OVS
agent issues update_device_list call
Impact: if a network node is rebooted (due to hardware failure or some
other reason), the node is left in an inconsistent state after the
reboot. Random number of fdb entries are missing causing disruption to
user traffic.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1651672
Title:
dhcp_ready_on_ports causing race with Neutron OVS agent boot
Status in neutron:
New
Bug description:
When neutron-openvswitch-agent starts it indirectly causes all its
ports to transition into BUILD state. If DHCP is enabled a DHCP agent
receives a port.update.end notification and refreshes its
configuration. After this a dhcp_ready_on_ports RPC call is made.
In this stage there are no provisioning blocks as we haven't created
any as no new ports are actually created. However,
PROVISIONING_COMPLETE event is still emitted which causes the ports to
transition into ACTIVE state. If l2pop is enabled, fdb entries are
sent at this stage.
The problem: with large number of ports, OVS agent is most likely
still processing and allocating local vlans. This causes some (or all)
of the fdb entries to be discarded as there are no local vlans. When
the OVS agent reaches the point where it uses update_device_list RPC
call to transition ports into ACTIVE they are already in that state
and no fdb entries are emitted.
Version: observed in Newton (neutron 9.0.0)
Pre-conditions:
- standalone network node with l3-agent in legacy mode
- dhcp agent running on another node
- ovsdb_interface in vsctl mode (due to performance issues with IDL)
To reproduce:
- have a L3 node with large amount of ports (we had about 1000)
- have a DHCP agent running on some other node
- issue a cold boot on the L3 node (no ports in br-int, no existing flows in br-tun). start ovs agent and l3 agent at the same time
- observe incoming fdb entries before ports are actually provisioned
Expected behaviour: dhcp agent should not cause these ports to
transition into ACTIVE. fdb entries should be emitted only when OVS
agent issues update_device_list call
Impact: if a network node is rebooted (due to hardware failure or some
other reason), the node is left in an inconsistent state after the
reboot. Random number of fdb entries are missing causing disruption to
user traffic.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1651672/+subscriptions