yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #95917
[Bug 2111593] [NEW] [ovn] Race between FIP NAT entry creation and OVN port status update
Public bug reported:
Sometimes the external_mac is missing in NAT entries in ovn-nb while
it's supposed to be there.
In my case a VM port (vnic-type=remote-managed) is created and, shortly after that, a new floating IP is created and assigned to this port. Following that, ovn-controller reports that the port is operationally up as it plugs a VF representor into the OVS bridge and the status propagates from ovn-controller -> ovn-sb -> ovn-northd -> ovn-nb -> Neutron. Eventually the OVN NB notification hits Neutron which calls `set_port_status_up` and `_update_dnat_entry_if_needed`.
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1165-L1171
But the debug message is never logged in _update_dnat_entry_if_needed (debug=True is set, ovn_distributed_floating_ip is enabled) as the NAT entry has not been committed yet:
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1143-L1145
if mac and nat['external_mac'] != mac:
LOG.debug("Setting external_mac of port %s to %s",
port_id, mac)
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1127-L1128
When looking at the transaction logs for the NAT table in `ovsdb-tool
-mm show-log /var/lib/ovn/ovn_nb.db` I can see that the external-id
`neutron:fip_external_mac` is present but not the `external_mac`.
The NAT entry is committed at FIP creation time and the presence of
`external_mac` is conditional on LSP for the VM port being UP already.
`neutron:fip_external_mac`, in contrast, is committed unconditionally
per the code:
https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L907 (unconditional for neutron:fip_external_mac)
https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L923-L925 (only sets external_mac if distributed FIPs are enabled and the port's LSP is UP).
So if the LSP is not UP at the time of check in
`_create_or_update_floatingip`, the NAT entry is created without the
external_mac. However, `set_port_status_up` that runs in parallel, but
before the NAT entry is committed, simply does not see the NAT record
yet and `external_mac` never gets updated by either of the functions.
The outcome is that the VM is not reachable due to the lack of the
external_mac.
In order to fix that, Neutron could check the LSP status after
committing the NAT entry as well and updating the external_mac
accordingly.
Discovered in Neutron 2024.1 but affects the current versions as well.
** Affects: neutron
Importance: Undecided
Status: In Progress
** Description changed:
Sometimes the external_mac is missing in NAT entries in ovn-nb while
it's supposed to be there.
- In my case a VM port (vnic-type=remote-managed) is created and, shortly after that, a new floating IP is created and assigned to this port. Following that, ovn-controller reports that the port is operationally up as it plugs a VF representor into the OVS bridge and the status propagates from ovn-controller -> ovn-sb -> ovn-northd -> ovn-nb -> Neutron. Eventually the OVN NB notification hits Neutron which calls `set_port_status_up` and `_update_dnat_entry_if_needed`.
+ In my case a VM port (vnic-type=remote-managed) is created and, shortly after that, a new floating IP is created and assigned to this port. Following that, ovn-controller reports that the port is operationally up as it plugs a VF representor into the OVS bridge and the status propagates from ovn-controller -> ovn-sb -> ovn-northd -> ovn-nb -> Neutron. Eventually the OVN NB notification hits Neutron which calls `set_port_status_up` and `_update_dnat_entry_if_needed`.
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1165-L1171
But the debug message is never logged in _update_dnat_entry_if_needed (debug=True is set, ovn_distributed_floating_ip is enabled) as the NAT entry has not been committed yet:
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1143-L1145
- if mac and nat['external_mac'] != mac:
- LOG.debug("Setting external_mac of port %s to %s",
- port_id, mac)
+ if mac and nat['external_mac'] != mac:
+ LOG.debug("Setting external_mac of port %s to %s",
+ port_id, mac)
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1127-L1128
When looking at the transaction logs for the NAT table in `ovsdb-tool
-mm show-log /var/lib/ovn/ovn_nb.db` I can see that the external-id
`neutron:fip_mac_address` is present but not the `external_mac`.
The NAT entry is committed at FIP creation time and the presence of
`external_mac` is conditional on LSP for the VM port being UP already.
`neutron:fip_mac_address`, in contrast, is committed unconditionally per
the code:
https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L907 (unconditional for neutron:fip_mac_address)
https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L923-L925 (only sets external_mac if distributed FIPs are enabled and the port's LSP is UP).
So if the LSP is not UP at the time of check in
`_create_or_update_floatingip`, the NAT entry is created without the
external_mac. However, `set_port_status_up` that runs in parallel, but
before the NAT entry is committed, simply does not see the NAT record
yet and `external_mac` never gets updated by either of the functions.
The outcome is that the VM is not reachable due to the lack of the
external_mac.
In order to fix that, Neutron could check the LSP status after
committing the NAT entry as well and updating the external_mac
accordingly.
+
+ Discovered in Neutron 2024.1 but affects the current versions as well.
** Description changed:
Sometimes the external_mac is missing in NAT entries in ovn-nb while
it's supposed to be there.
In my case a VM port (vnic-type=remote-managed) is created and, shortly after that, a new floating IP is created and assigned to this port. Following that, ovn-controller reports that the port is operationally up as it plugs a VF representor into the OVS bridge and the status propagates from ovn-controller -> ovn-sb -> ovn-northd -> ovn-nb -> Neutron. Eventually the OVN NB notification hits Neutron which calls `set_port_status_up` and `_update_dnat_entry_if_needed`.
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1165-L1171
But the debug message is never logged in _update_dnat_entry_if_needed (debug=True is set, ovn_distributed_floating_ip is enabled) as the NAT entry has not been committed yet:
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1143-L1145
if mac and nat['external_mac'] != mac:
LOG.debug("Setting external_mac of port %s to %s",
port_id, mac)
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1127-L1128
When looking at the transaction logs for the NAT table in `ovsdb-tool
-mm show-log /var/lib/ovn/ovn_nb.db` I can see that the external-id
- `neutron:fip_mac_address` is present but not the `external_mac`.
+ `neutron:fip_external_mac` is present but not the `external_mac`.
The NAT entry is committed at FIP creation time and the presence of
`external_mac` is conditional on LSP for the VM port being UP already.
- `neutron:fip_mac_address`, in contrast, is committed unconditionally per
- the code:
+ `neutron:fip_external_mac`, in contrast, is committed unconditionally
+ per the code:
- https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L907 (unconditional for neutron:fip_mac_address)
+ https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L907 (unconditional for neutron:fip_external_mac)
https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L923-L925 (only sets external_mac if distributed FIPs are enabled and the port's LSP is UP).
So if the LSP is not UP at the time of check in
`_create_or_update_floatingip`, the NAT entry is created without the
external_mac. However, `set_port_status_up` that runs in parallel, but
before the NAT entry is committed, simply does not see the NAT record
yet and `external_mac` never gets updated by either of the functions.
The outcome is that the VM is not reachable due to the lack of the
external_mac.
In order to fix that, Neutron could check the LSP status after
committing the NAT entry as well and updating the external_mac
accordingly.
Discovered in Neutron 2024.1 but affects the current versions as well.
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2111593
Title:
[ovn] Race between FIP NAT entry creation and OVN port status update
Status in neutron:
In Progress
Bug description:
Sometimes the external_mac is missing in NAT entries in ovn-nb while
it's supposed to be there.
In my case a VM port (vnic-type=remote-managed) is created and, shortly after that, a new floating IP is created and assigned to this port. Following that, ovn-controller reports that the port is operationally up as it plugs a VF representor into the OVS bridge and the status propagates from ovn-controller -> ovn-sb -> ovn-northd -> ovn-nb -> Neutron. Eventually the OVN NB notification hits Neutron which calls `set_port_status_up` and `_update_dnat_entry_if_needed`.
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1165-L1171
But the debug message is never logged in _update_dnat_entry_if_needed (debug=True is set, ovn_distributed_floating_ip is enabled) as the NAT entry has not been committed yet:
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1143-L1145
if mac and nat['external_mac'] != mac:
LOG.debug("Setting external_mac of port %s to %s",
port_id, mac)
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1127-L1128
When looking at the transaction logs for the NAT table in `ovsdb-tool
-mm show-log /var/lib/ovn/ovn_nb.db` I can see that the external-id
`neutron:fip_external_mac` is present but not the `external_mac`.
The NAT entry is committed at FIP creation time and the presence of
`external_mac` is conditional on LSP for the VM port being UP already.
`neutron:fip_external_mac`, in contrast, is committed unconditionally
per the code:
https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L907 (unconditional for neutron:fip_external_mac)
https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L923-L925 (only sets external_mac if distributed FIPs are enabled and the port's LSP is UP).
So if the LSP is not UP at the time of check in
`_create_or_update_floatingip`, the NAT entry is created without the
external_mac. However, `set_port_status_up` that runs in parallel, but
before the NAT entry is committed, simply does not see the NAT record
yet and `external_mac` never gets updated by either of the functions.
The outcome is that the VM is not reachable due to the lack of the
external_mac.
In order to fix that, Neutron could check the LSP status after
committing the NAT entry as well and updating the external_mac
accordingly.
Discovered in Neutron 2024.1 but affects the current versions as well.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2111593/+subscriptions