yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #93354
[Bug 2051172] [NEW] [ovn-octavia-provider] Deleting or updating Load Balancer members in on pool can effect other pools or same pool if the same IP is used
Public bug reported:
* High level description:
We have discovered that when using Octavia OVN Load Balancers, if you have 2 or more pools with the same member ips (but different ports) if you remove the member from one pool it will remove the ip configuration from the "ip_port_mappings" in the OVN Load Balancer even if another pool is still using this IP. Additionally, if you update the pool members to point to the some member IP's but a different port the "ip_port_mappings" will also be cleared for those IP's. The update behavior is because the additions are done by the provider followed by the deletions, so if the IP addresses are the same, the final deletions will clear out the IP's.
We experienced this problem when using Magnum with OVN load balancers,
as Magnum in some cases would update the load balancers to ensure proper
configuration, and this would result in the load balancers not function
even though the final Octavia configuration was correct (but since the
ip_port_mapping table was cleared the LB were not operational)
This behavior was likely introduced by the fix for bug:
https://bugs.launchpad.net/neutron/+bug/2007835
Which resulted in commit:
https://opendev.org/openstack/ovn-octavia-provider/commit/e40e0d62ac089afd14c03102d80990e792211af3
This commit updates the OVS / OVN databases in a more efficient way
(only adding or deleting what is needed). However, since it is not
recomputing all ip_port_mappings for every update, its possible to hit
this scenario.
The "ovn-octavia-provider/ovn_octavia_provider/helper.py" code already
includes a solution to this scenario in the "_clean_ip_port_mappings"
method. In this case, the method will verify if the ip address is in
use by any other pools before removing the ip address from the
"ip_port_mappings" table. The same logic can also be applied to the
"_update_ip_port_mappings" to prevent this issue from occurring in the
pool member update or delete cases. I have tried making such a change,
and it works well for us.
However, it should be noted that the logic in the
"_clean_ip_port_mappings" only looks at other pools, not at the current
pool. Therefore, if pool members were updated from one port to another
port but with the same IP, my patch to apply this to thae same logic
"_update_ip_port_mappings" may not cover all scenarios.
Therefore, it may be optimal to update the checks to be more thorough.
If the maintainers of this project are in agreement with the logic of my
proposed updates, I could work to submit an oficial patch for this
issue.
* Version:
** OpenStack version
Openstack Antelope (2023.1)
** Linux distro, kernel.
Ubuntu 22.04
** DevStack or other _deployment_ mechanism?
Kolla Anisable
Patch for fixing issue:
```
--- helper.py.new_orig 2024-01-21 03:05:27.000000000 +0000
+++ helper.py.proposed1 2024-01-24 19:12:17.843028336 +0000
@@ -2441,7 +2441,7 @@
self._execute_commands(commands)
return True
- def _update_ip_port_mappings(self, ovn_lb, backend_ip, port_name, src_ip,
+ def _update_ip_port_mappings(self, ovn_lb, backend_ip, port_name, src_ip, pool_key,
delete=False):
# ip_port_mappings:${MEMBER_IP}=${LSP_NAME_MEMBER}:${HEALTH_SRC}
@@ -2449,10 +2449,42 @@
# MEMBER_IP: IP of member_lsp
# LSP_NAME_MEMBER: Logical switch port
# HEALTH_SRC: source IP of hm_port
-
if delete:
- self.ovn_nbdb_api.lb_del_ip_port_mapping(ovn_lb.uuid,
- backend_ip).execute()
+ # NOTE(jrhine): This is basically the same code as in the _clean_ip_port_mappings
+ # function, but with a single member ip. Just like in that function,
+ # before removing a member from the ip_port_mappings
+ # list, we need to ensure that the member is not being used
+ # by an other pool to prevent accidentally removing the
+ # member we can use the neutron:member_status to search for any
+ # other members with the same address
+ other_members = []
+ for k, v in ovn_lb.external_ids.items():
+ if ovn_const.LB_EXT_IDS_POOL_PREFIX in k and k != pool_key:
+ other_members.extend(self._extract_member_info(
+ ovn_lb.external_ids[k]))
+
+ member_statuses = ovn_lb.external_ids.get(
+ ovn_const.OVN_MEMBER_STATUS_KEY)
+
+ try:
+ member_statuses = jsonutils.loads(member_statuses)
+ except TypeError:
+ LOG.debug("no member status on external_ids: %s",
+ str(member_statuses))
+ member_statuses = {}
+
+ execute_delete = True
+ for member_id in [item[3] for item in other_members
+ if item[0] == backend_ip]:
+ if member_statuses.get(
+ member_id, '') != constants.NO_MONITOR:
+ # same address being monitorized by another HM
+ execute_delete = False
+ LOG.debug(f"Rumble: member {member_id} using address {backend_ip}, so not deleting it ")
+ if execute_delete:
+ LOG.debug(f"Rumble: verified address {backend_ip} is not being used both other memebers, so deleting ip_port_mapping")
+ self.ovn_nbdb_api.lb_del_ip_port_mapping(ovn_lb.uuid,
+ backend_ip).execute()
else:
self.ovn_nbdb_api.lb_add_ip_port_mapping(ovn_lb.uuid,
backend_ip,
@@ -2542,7 +2574,7 @@
'pool': pool_key})
return None
self._update_ip_port_mappings(ovn_lb, backend_ip,
- member_lsp.name, hm_source_ip,
+ member_lsp.name, hm_source_ip, pool_key,
delete)
return constants.ONLINE
```
** Affects: neutron
Importance: Undecided
Status: New
** Tags: ovn-octavia-provider
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2051172
Title:
[ovn-octavia-provider] Deleting or updating Load Balancer members in
on pool can effect other pools or same pool if the same IP is used
Status in neutron:
New
Bug description:
* High level description:
We have discovered that when using Octavia OVN Load Balancers, if you have 2 or more pools with the same member ips (but different ports) if you remove the member from one pool it will remove the ip configuration from the "ip_port_mappings" in the OVN Load Balancer even if another pool is still using this IP. Additionally, if you update the pool members to point to the some member IP's but a different port the "ip_port_mappings" will also be cleared for those IP's. The update behavior is because the additions are done by the provider followed by the deletions, so if the IP addresses are the same, the final deletions will clear out the IP's.
We experienced this problem when using Magnum with OVN load balancers,
as Magnum in some cases would update the load balancers to ensure
proper configuration, and this would result in the load balancers not
function even though the final Octavia configuration was correct (but
since the ip_port_mapping table was cleared the LB were not
operational)
This behavior was likely introduced by the fix for bug:
https://bugs.launchpad.net/neutron/+bug/2007835
Which resulted in commit:
https://opendev.org/openstack/ovn-octavia-provider/commit/e40e0d62ac089afd14c03102d80990e792211af3
This commit updates the OVS / OVN databases in a more efficient way
(only adding or deleting what is needed). However, since it is not
recomputing all ip_port_mappings for every update, its possible to hit
this scenario.
The "ovn-octavia-provider/ovn_octavia_provider/helper.py" code already
includes a solution to this scenario in the "_clean_ip_port_mappings"
method. In this case, the method will verify if the ip address is in
use by any other pools before removing the ip address from the
"ip_port_mappings" table. The same logic can also be applied to the
"_update_ip_port_mappings" to prevent this issue from occurring in the
pool member update or delete cases. I have tried making such a
change, and it works well for us.
However, it should be noted that the logic in the
"_clean_ip_port_mappings" only looks at other pools, not at the
current pool. Therefore, if pool members were updated from one port
to another port but with the same IP, my patch to apply this to thae
same logic "_update_ip_port_mappings" may not cover all scenarios.
Therefore, it may be optimal to update the checks to be more thorough.
If the maintainers of this project are in agreement with the logic of
my proposed updates, I could work to submit an oficial patch for this
issue.
* Version:
** OpenStack version
Openstack Antelope (2023.1)
** Linux distro, kernel.
Ubuntu 22.04
** DevStack or other _deployment_ mechanism?
Kolla Anisable
Patch for fixing issue:
```
--- helper.py.new_orig 2024-01-21 03:05:27.000000000 +0000
+++ helper.py.proposed1 2024-01-24 19:12:17.843028336 +0000
@@ -2441,7 +2441,7 @@
self._execute_commands(commands)
return True
- def _update_ip_port_mappings(self, ovn_lb, backend_ip, port_name, src_ip,
+ def _update_ip_port_mappings(self, ovn_lb, backend_ip, port_name, src_ip, pool_key,
delete=False):
# ip_port_mappings:${MEMBER_IP}=${LSP_NAME_MEMBER}:${HEALTH_SRC}
@@ -2449,10 +2449,42 @@
# MEMBER_IP: IP of member_lsp
# LSP_NAME_MEMBER: Logical switch port
# HEALTH_SRC: source IP of hm_port
-
if delete:
- self.ovn_nbdb_api.lb_del_ip_port_mapping(ovn_lb.uuid,
- backend_ip).execute()
+ # NOTE(jrhine): This is basically the same code as in the _clean_ip_port_mappings
+ # function, but with a single member ip. Just like in that function,
+ # before removing a member from the ip_port_mappings
+ # list, we need to ensure that the member is not being used
+ # by an other pool to prevent accidentally removing the
+ # member we can use the neutron:member_status to search for any
+ # other members with the same address
+ other_members = []
+ for k, v in ovn_lb.external_ids.items():
+ if ovn_const.LB_EXT_IDS_POOL_PREFIX in k and k != pool_key:
+ other_members.extend(self._extract_member_info(
+ ovn_lb.external_ids[k]))
+
+ member_statuses = ovn_lb.external_ids.get(
+ ovn_const.OVN_MEMBER_STATUS_KEY)
+
+ try:
+ member_statuses = jsonutils.loads(member_statuses)
+ except TypeError:
+ LOG.debug("no member status on external_ids: %s",
+ str(member_statuses))
+ member_statuses = {}
+
+ execute_delete = True
+ for member_id in [item[3] for item in other_members
+ if item[0] == backend_ip]:
+ if member_statuses.get(
+ member_id, '') != constants.NO_MONITOR:
+ # same address being monitorized by another HM
+ execute_delete = False
+ LOG.debug(f"Rumble: member {member_id} using address {backend_ip}, so not deleting it ")
+ if execute_delete:
+ LOG.debug(f"Rumble: verified address {backend_ip} is not being used both other memebers, so deleting ip_port_mapping")
+ self.ovn_nbdb_api.lb_del_ip_port_mapping(ovn_lb.uuid,
+ backend_ip).execute()
else:
self.ovn_nbdb_api.lb_add_ip_port_mapping(ovn_lb.uuid,
backend_ip,
@@ -2542,7 +2574,7 @@
'pool': pool_key})
return None
self._update_ip_port_mappings(ovn_lb, backend_ip,
- member_lsp.name, hm_source_ip,
+ member_lsp.name, hm_source_ip, pool_key,
delete)
return constants.ONLINE
```
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2051172/+subscriptions