yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #95191
[Bug 2094840] [NEW] OVN driver doesn't receive port-binding UP update from OVN controller; Nova eventually times out while building VM
Public bug reported:
I'm seeing this in gate in this run:
https://zuul.opendev.org/t/openstack/build/8987272b4be843ea9dffb266ca559006/logs
The symptom of the test failure is that VM is not built. This happens
because port is not UP (nova hasn't received vif event for the state
change). When we check ovn-controller logs, we see the relevant port is
set to up:
```
2025-01-13T22:30:10.704Z|00067|binding|INFO|Claiming lport 75ffc908-b9c5-421f-b1b9-117b158e860d for this chassis.
2025-01-13T22:30:10.704Z|00068|binding|INFO|75ffc908-b9c5-421f-b1b9-117b158e860d: Claiming fa:16:3e:11:92:5d 10.1.0.13
2025-01-13T22:30:10.704Z|00069|binding|INFO|Claiming lport c469b2c4-6961-4953-a2c6-0c106c80b5c8 for this chassis.
2025-01-13T22:30:10.704Z|00070|binding|INFO|c469b2c4-6961-4953-a2c6-0c106c80b5c8: Claiming fa:16:3e:d5:d9:4e 10.1.0.7
2025-01-13T22:30:10.730Z|00071|binding|INFO|Setting lport 75ffc908-b9c5-421f-b1b9-117b158e860d ovn-installed in OVS
2025-01-13T22:30:10.730Z|00072|binding|INFO|Setting lport 75ffc908-b9c5-421f-b1b9-117b158e860d up in Southbound
2025-01-13T22:30:10.730Z|00073|binding|INFO|Setting lport c469b2c4-6961-4953-a2c6-0c106c80b5c8 ovn-installed in OVS
2025-01-13T22:30:10.730Z|00074|binding|INFO|Setting lport c469b2c4-6961-4953-a2c6-0c106c80b5c8 up in Southbound
```
The port in question is 75ffc908-b9c5-421f-b1b9-117b158e860d.
In neutron-api log, ovsdb-monitor receives the updates to SB port-
binding table:
Jan 13 22:30:10.766618 np0039557494 devstack@neutron-api.service[62570]: DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [None req-0d900a20-58b1-4662-9217-856061d78672 None None] Hash Ring: Node d6133697-439f-4059-b5e7-152917a32dd1 (host: np0039557494) handling event "update" for row cfeb99a7-95f6-4f8c-bb64-cd628734b141 (table: Port_Binding) {{(pid=62570) notify /opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:852}}
Jan 13 22:30:10.832940 np0039557494 devstack@neutron-api.service[62570]: DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [None req-0d900a20-58b1-4662-9217-856061d78672 None None] Hash Ring: Node d6133697-439f-4059-b5e7-152917a32dd1 (host: np0039557494) handling event "update" for row cfeb99a7-95f6-4f8c-bb64-cd628734b141 (table: Port_Binding) {{(pid=62570) notify /opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:852}}
But note that both events refer to the same row id. Note that in ovn-
controller log snippet, two different ports are being claimed / set to
up at the same time.
Either ovn-controller failed to update 'up' field in SB; or ovsdb-server
incorrectly sent duplicate updates to watchers; or Idl watcher somehow
messed the IDs.
---
In gate, OVN is quite old: 22.03; OVS components are at 2.17.9 (also
very old). Wonder if there were some race conditions now fixed in later
python-ovs or ovsdb-server or ovn-controller...
** Affects: neutron
Importance: Undecided
Status: New
** Tags: gate-failure ovn
** Tags added: gate-failure ovn
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2094840
Title:
OVN driver doesn't receive port-binding UP update from OVN controller;
Nova eventually times out while building VM
Status in neutron:
New
Bug description:
I'm seeing this in gate in this run:
https://zuul.opendev.org/t/openstack/build/8987272b4be843ea9dffb266ca559006/logs
The symptom of the test failure is that VM is not built. This happens
because port is not UP (nova hasn't received vif event for the state
change). When we check ovn-controller logs, we see the relevant port
is set to up:
```
2025-01-13T22:30:10.704Z|00067|binding|INFO|Claiming lport 75ffc908-b9c5-421f-b1b9-117b158e860d for this chassis.
2025-01-13T22:30:10.704Z|00068|binding|INFO|75ffc908-b9c5-421f-b1b9-117b158e860d: Claiming fa:16:3e:11:92:5d 10.1.0.13
2025-01-13T22:30:10.704Z|00069|binding|INFO|Claiming lport c469b2c4-6961-4953-a2c6-0c106c80b5c8 for this chassis.
2025-01-13T22:30:10.704Z|00070|binding|INFO|c469b2c4-6961-4953-a2c6-0c106c80b5c8: Claiming fa:16:3e:d5:d9:4e 10.1.0.7
2025-01-13T22:30:10.730Z|00071|binding|INFO|Setting lport 75ffc908-b9c5-421f-b1b9-117b158e860d ovn-installed in OVS
2025-01-13T22:30:10.730Z|00072|binding|INFO|Setting lport 75ffc908-b9c5-421f-b1b9-117b158e860d up in Southbound
2025-01-13T22:30:10.730Z|00073|binding|INFO|Setting lport c469b2c4-6961-4953-a2c6-0c106c80b5c8 ovn-installed in OVS
2025-01-13T22:30:10.730Z|00074|binding|INFO|Setting lport c469b2c4-6961-4953-a2c6-0c106c80b5c8 up in Southbound
```
The port in question is 75ffc908-b9c5-421f-b1b9-117b158e860d.
In neutron-api log, ovsdb-monitor receives the updates to SB port-
binding table:
Jan 13 22:30:10.766618 np0039557494 devstack@neutron-api.service[62570]: DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [None req-0d900a20-58b1-4662-9217-856061d78672 None None] Hash Ring: Node d6133697-439f-4059-b5e7-152917a32dd1 (host: np0039557494) handling event "update" for row cfeb99a7-95f6-4f8c-bb64-cd628734b141 (table: Port_Binding) {{(pid=62570) notify /opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:852}}
Jan 13 22:30:10.832940 np0039557494 devstack@neutron-api.service[62570]: DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [None req-0d900a20-58b1-4662-9217-856061d78672 None None] Hash Ring: Node d6133697-439f-4059-b5e7-152917a32dd1 (host: np0039557494) handling event "update" for row cfeb99a7-95f6-4f8c-bb64-cd628734b141 (table: Port_Binding) {{(pid=62570) notify /opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:852}}
But note that both events refer to the same row id. Note that in ovn-
controller log snippet, two different ports are being claimed / set to
up at the same time.
Either ovn-controller failed to update 'up' field in SB; or ovsdb-
server incorrectly sent duplicate updates to watchers; or Idl watcher
somehow messed the IDs.
---
In gate, OVN is quite old: 22.03; OVS components are at 2.17.9 (also
very old). Wonder if there were some race conditions now fixed in
later python-ovs or ovsdb-server or ovn-controller...
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2094840/+subscriptions
Follow ups