← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1997982] Re: after restart of a ovn-controller the agent is still down

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/865697
Committed: https://opendev.org/openstack/neutron/commit/4cc611d319d0afe1ee04df6e4419014f1133df09
Submitter: "Zuul (22348)"
Branch:    master

commit 4cc611d319d0afe1ee04df6e4419014f1133df09
Author: Felix Huettner <felix.huettner@mail.schwarz>
Date:   Fri Nov 25 16:39:31 2022 +0100

    Fix handling the restart of ovn-controllers
    
    The previous `getattr(old, 'nb_cfg', False)` would evaluate to `False`
    if the `old` row either did not contain a `nb_cfg` value or if the value
    was 0.
    
    As 0 is the value set on startup of the ovn-controller this causes the
    neutron-api to ignore any event a ovn-controller directly sends after
    startup. In turn this causes us to miss the information that the agent
    is synchronized, causing the agent to appear as down, until something
    bumps the `nb_cfg` value globally.
    
    Closes-Bug: #1997982
    
    Change-Id: Icec8fee93e64b871999f38674e305238e9705fd4


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1997982

Title:
  after restart of a ovn-controller the  agent is still down

Status in neutron:
  Fix Released

Bug description:
  Assume a neutron setup with the ml2 ovn plugin.
  Further assume for the duration of this issue that no changes are made on the user api, so that nb_cfg at the start of the issue is equal to nb_cfg at the end of the issue:

  1. Take any ovn-controller that you have and run a openstack network agent show on it; this should say "up" and a valid "heartbeat_timestamp"
  2. Restart the ovn-controller
  3. the openstack output should not say down with the unix 0 timestamp as heartbeat
  4. Do any change that causes nb_cfg to increase
  5. the agent is now up with a proper timestamp

  Issue is caused by
  https://opendev.org/openstack/neutron/src/commit/0384b3193b11eb6cc849c4511d2e539d42b6d3f9/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py#L339

  in step 2 the southbound database will emit two events:
  1. when the ovn-controller first starts, one with the addition of Chassis_Private where nb_cfg and nb_cfg_timestamp is 0
  2. when the ovn-controller has finished syncing with the nb_cfg as in SB_GLOBAL and nb_cfg_timestamp with the current timestamp

  however the second event is currently filtered by the `match_fn` as
  `old.nb_cfg` is `0` at this point. In the condition `0` is evaluated
  to `False` thereby ignoring the event.

  This issue might be the same as
  https://bugs.launchpad.net/neutron/+bug/1955503

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1997982/+subscriptions



References