yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #90811
[Bug 1997982] Re: after restart of a ovn-controller the agent is still down
Reviewed: https://review.opendev.org/c/openstack/neutron/+/865697
Committed: https://opendev.org/openstack/neutron/commit/4cc611d319d0afe1ee04df6e4419014f1133df09
Submitter: "Zuul (22348)"
Branch: master
commit 4cc611d319d0afe1ee04df6e4419014f1133df09
Author: Felix Huettner <felix.huettner@mail.schwarz>
Date: Fri Nov 25 16:39:31 2022 +0100
Fix handling the restart of ovn-controllers
The previous `getattr(old, 'nb_cfg', False)` would evaluate to `False`
if the `old` row either did not contain a `nb_cfg` value or if the value
was 0.
As 0 is the value set on startup of the ovn-controller this causes the
neutron-api to ignore any event a ovn-controller directly sends after
startup. In turn this causes us to miss the information that the agent
is synchronized, causing the agent to appear as down, until something
bumps the `nb_cfg` value globally.
Closes-Bug: #1997982
Change-Id: Icec8fee93e64b871999f38674e305238e9705fd4
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1997982
Title:
after restart of a ovn-controller the agent is still down
Status in neutron:
Fix Released
Bug description:
Assume a neutron setup with the ml2 ovn plugin.
Further assume for the duration of this issue that no changes are made on the user api, so that nb_cfg at the start of the issue is equal to nb_cfg at the end of the issue:
1. Take any ovn-controller that you have and run a openstack network agent show on it; this should say "up" and a valid "heartbeat_timestamp"
2. Restart the ovn-controller
3. the openstack output should not say down with the unix 0 timestamp as heartbeat
4. Do any change that causes nb_cfg to increase
5. the agent is now up with a proper timestamp
Issue is caused by
https://opendev.org/openstack/neutron/src/commit/0384b3193b11eb6cc849c4511d2e539d42b6d3f9/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py#L339
in step 2 the southbound database will emit two events:
1. when the ovn-controller first starts, one with the addition of Chassis_Private where nb_cfg and nb_cfg_timestamp is 0
2. when the ovn-controller has finished syncing with the nb_cfg as in SB_GLOBAL and nb_cfg_timestamp with the current timestamp
however the second event is currently filtered by the `match_fn` as
`old.nb_cfg` is `0` at this point. In the condition `0` is evaluated
to `False` thereby ignoring the event.
This issue might be the same as
https://bugs.launchpad.net/neutron/+bug/1955503
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1997982/+subscriptions
References