yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1965772] [NEW] ovn-octavia-provider does not report status correctly to octavia

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Gabriel Barazer <1965772@xxxxxxxxxxxxxxxxxx>
Date: Mon, 21 Mar 2022 14:33:54 -0000
Reply-to: Bug 1965772 <1965772@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx
Public bug reported:

Hi all,

The OVN Octavia provider does not report status correctly to Octavia due
to a few bugs in the health monitoring implementation:

1) https://opendev.org/openstack/ovn-octavia-provider/src/commit/d6adbcef86e32bc7befbd5890a2bc79256b7a8e2/ovn_octavia_provider/helper.py#L2374 :
In _get_lb_on_hm_event, the request to the OVN NB API (db_find_rows) is incorrect:
        lbs = self.ovn_nbdb_api.db_find_rows(
            'Load_Balancer', (('ip_port_mappings', '=', mappings),
                              ('protocol', '=', row.protocol))).execute()

Should be :
        lbs = self.ovn_nbdb_api.db_find_rows(
            'Load_Balancer', ('ip_port_mappings', '=', mappings),
                              ('protocol', '=', row.protocol[0])).execute()

Note the removed extra parenthesis and the protocol string which is
found in the first element of the protocol[] list.

2) https://opendev.org/openstack/ovn-octavia-
provider/src/commit/d6adbcef86e32bc7befbd5890a2bc79256b7a8e2/ovn_octavia_provider/helper.py#L2426
:

There is a confusion with the Pool object returned by (pool =
self._octavia_driver_lib.get_pool(pool_id)) : this object does not
contain any operating_status attribute and it seems given the current
state of the octavia-lib that it possible to set and update the status
for a listener/pool/member but not possible to retrieve the current
status.

See https://opendev.org/openstack/octavia-
lib/src/branch/master/octavia_lib/api/drivers/data_models.py for the
current Pool data model.

As a result, the computation done by _get_new_operating_statuses cannot
use the current operating status to set a new operating status. It is
still possible to set an operating status for the members by setting
them to "OFFLINE" separately when a HM update event is fired.

3) The Load_Balancer_Health_Check NB entry creates the Service_Monitor
SB entries, but there isn't any way to link the Service_Monitor entries
created with the original NB entry. The result is that health monitor
events received from the SB and processed by the octavia driver agent
cannot be accurately matched with the correct octavia health monitor
entry. If we have for example two load balancer entries using the same
pool members and the same ports, only the first LB returned with
db_find_rows would be updated (given the #2 bug is fixed). The case for
having 2 load balancers with the same members is perfectly valid when
using separate load balancers for public traffic (using a VIP from a
public pool) and another one for internal/admin traffic (using a VIP
from another pool, and with a source range whitelist). The code
selecting only the first LB in that case is the same as bug #1.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1965772

Title:
  ovn-octavia-provider does not report status correctly to octavia

Status in neutron:
  New

Bug description:
  Hi all,

  The OVN Octavia provider does not report status correctly to Octavia
  due to a few bugs in the health monitoring implementation:

  1) https://opendev.org/openstack/ovn-octavia-provider/src/commit/d6adbcef86e32bc7befbd5890a2bc79256b7a8e2/ovn_octavia_provider/helper.py#L2374 :
  In _get_lb_on_hm_event, the request to the OVN NB API (db_find_rows) is incorrect:
          lbs = self.ovn_nbdb_api.db_find_rows(
              'Load_Balancer', (('ip_port_mappings', '=', mappings),
                                ('protocol', '=', row.protocol))).execute()

  Should be :
          lbs = self.ovn_nbdb_api.db_find_rows(
              'Load_Balancer', ('ip_port_mappings', '=', mappings),
                                ('protocol', '=', row.protocol[0])).execute()

  Note the removed extra parenthesis and the protocol string which is
  found in the first element of the protocol[] list.

  2) https://opendev.org/openstack/ovn-octavia-
  provider/src/commit/d6adbcef86e32bc7befbd5890a2bc79256b7a8e2/ovn_octavia_provider/helper.py#L2426
  :

  There is a confusion with the Pool object returned by (pool =
  self._octavia_driver_lib.get_pool(pool_id)) : this object does not
  contain any operating_status attribute and it seems given the current
  state of the octavia-lib that it possible to set and update the status
  for a listener/pool/member but not possible to retrieve the current
  status.

  See https://opendev.org/openstack/octavia-
  lib/src/branch/master/octavia_lib/api/drivers/data_models.py for the
  current Pool data model.

  As a result, the computation done by _get_new_operating_statuses
  cannot use the current operating status to set a new operating status.
  It is still possible to set an operating status for the members by
  setting them to "OFFLINE" separately when a HM update event is fired.

  3) The Load_Balancer_Health_Check NB entry creates the Service_Monitor
  SB entries, but there isn't any way to link the Service_Monitor
  entries created with the original NB entry. The result is that health
  monitor events received from the SB and processed by the octavia
  driver agent cannot be accurately matched with the correct octavia
  health monitor entry. If we have for example two load balancer entries
  using the same pool members and the same ports, only the first LB
  returned with db_find_rows would be updated (given the #2 bug is
  fixed). The case for having 2 load balancers with the same members is
  perfectly valid when using separate load balancers for public traffic
  (using a VIP from a public pool) and another one for internal/admin
  traffic (using a VIP from another pool, and with a source range
  whitelist). The code selecting only the first LB in that case is the
  same as bug #1.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1965772/+subscriptions
Follow ups

[Bug 1965772] Re: ovn-octavia-provider does not report status correctly to octavia
From: Fernando Royo, 2022-07-13