← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1988199] [NEW] [OVN][live-migration] Nova port binding request and "LogicalSwitchPortUpdateUpEvent" race condition

 

Public bug reported:

Related Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2120409

Summary: after the a live-migration, the VM port status is DOWN.

During a live-migration, the following events happen in the Neutron server:
1) We receive a port update. Because the "migrating_to" field is in the port binding, the OVN mech driver forces a port update from DOWN to UP. This (1) sets the port status to UP and (2) sends the vif-plugged event to Nova. That will trigger the port creation (layer 1) in the destination node.

2) Then the "LogicalSwitchPortUpdateDownEvent", because the source port
was deleted. That sets the port status to DOWN.

3) At the same time we receive the "LogicalSwitchPortUpdateUpEvent",
because the port in the destination host has been created. This last
event won't manually set the port status to UP. Instead it will remove
any port provisioning block [1].

3.1) If the port provisioned is considered as complete
("provisioning_complete" event), this is processed in
"Ml2Plugin._port_provisioned". The problem we are hitting here is that
the port has no host (the port is still not bound):

2022-08-26 10:08:23.373 17 DEBUG neutron.plugins.ml2.plugin
[req-2b13d263-5748-46e2-9fdf-33df50634607 - - - - -] Port
943db0db-773f-45e9-8b68-0ebcc1840207 cannot update to ACTIVE because it
is not bound. _port_provisioned /usr/lib/python3.9/site-
packages/neutron/plugins/ml2/plugin.py:339

4) Right after the Nova port binding request is received and the port is
bound: https://paste.opendev.org/show/bIUoJkiStCIe8TBb0573/

This is basically the issue we have here: there is a race condition
between (1) the Nova port binding request and (2) the
"LogicalSwitchPortUpdateUpEvent" that is received when the OVS port is
created on a chassis.

Just for testing, if I add a 1 second sleep at the very first line of
"_port_provisioned", allowing to receive the Nova port binding request
(that will bind the port to a host), the port provisioning succeeds and
the port is set to UP. I'll find a way to fix that in the Ml2Plugin
code.

** Affects: neutron
     Importance: Medium
     Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez)
         Status: New

** Changed in: neutron
     Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez)

** Changed in: neutron
   Importance: Undecided => Medium

** Description changed:

  Related Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2120409
  
  During a live-migration, the following events happen in the Neutron server:
  1) We receive a port update. Because the "migrating_to" field is in the port binding, the OVN mech driver forces a port update from DOWN to UP. This (1) sets the port status to UP and (2) sends the vif-plugged event to Nova. That will trigger the port creation (layer 1) in the destination node.
  
  2) Then the "LogicalSwitchPortUpdateDownEvent", because the source port
  was deleted. That sets the port status to DOWN.
  
  3) At the same time we receive the "LogicalSwitchPortUpdateUpEvent",
  because the port in the destination host has been created. This last
  event won't manually set the port status to UP. Instead it will remove
  any port provisioning block [1].
  
  3.1) If the port provisioned is considered as complete
  ("provisioning_complete" event), this is processed in
  "Ml2Plugin._port_provisioned". The problem we are hitting here is that
  the port has no host (the port is still not bound):
  
  2022-08-26 10:08:23.373 17 DEBUG neutron.plugins.ml2.plugin
  [req-2b13d263-5748-46e2-9fdf-33df50634607 - - - - -] Port
  943db0db-773f-45e9-8b68-0ebcc1840207 cannot update to ACTIVE because it
  is not bound. _port_provisioned /usr/lib/python3.9/site-
  packages/neutron/plugins/ml2/plugin.py:339
  
- 
- 4) Right after the Nova port binding request is received and the port is bound:
- 2022-08-26 10:08:23.417 17 DEBUG neutron.plugins.ml2.managers [req-11a55e31-f868-4607-a453-b91e0c80f99e 2569cb90be29489691fa8b5f7df0b2d8 afe9d91a6e04414c80f074cd8a5569aa - default default] Attempting to bind port 943db0db-773f-45e9-8b68-0ebcc1840207 on host compute-1.redhat.local for vnic_type normal with profile {} bind_port /usr/lib/python3.9/site-packages/neutron/plugins/ml2/managers.py:806
- 2022-08-26 10:08:23.417 17 DEBUG neutron.plugins.ml2.managers [req-11a55e31-f868-4607-a453-b91e0c80f99e 2569cb90be29489691fa8b5f7df0b2d8 afe9d91a6e04414c80f074cd8a5569aa - default default] Attempting to bind port 943db0db-773f-45e9-8b68-0ebcc1840207 by drivers ovn on host compute-1.redhat.local at level 0 using segments [{'id': '24392396-50fa-475a-b2b0-85987fbb4099', 'network_type': 'geneve', 'physical_network': None, 'segmentation_id': 34066, 'network_id': '6f07db7e-b66b-4896-a3cc-c1a2adc86a26'}] _bind_port_level /usr/lib/python3.9/site-packages/neutron/plugins/ml2/managers.py:831
- 2022-08-26 10:08:23.418 17 DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver [req-11a55e31-f868-4607-a453-b91e0c80f99e 2569cb90be29489691fa8b5f7df0b2d8 afe9d91a6e04414c80f074cd8a5569aa - default default] Attempting to bind port 943db0db-773f-45e9-8b68-0ebcc1840207 on host compute-1.redhat.local for network segment with type geneve, segmentation ID 34066, physical network None bind_port /usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py:975
- 2022-08-26 10:08:23.419 17 DEBUG neutron.plugins.ml2.managers [req-11a55e31-f868-4607-a453-b91e0c80f99e 2569cb90be29489691fa8b5f7df0b2d8 afe9d91a6e04414c80f074cd8a5569aa - default default] Bound port: 943db0db-773f-45e9-8b68-0ebcc1840207, host: compute-1.redhat.local, vif_type: ovs, vif_details: {"port_filter": true, "connectivity": "l2"}, binding_levels: [{'bound_driver': 'ovn', 'bound_segment': {'id': '24392396-50fa-475a-b2b0-85987fbb4099', 'network_type': 'geneve', 'physical_network': None, 'segmentation_id': 34066, 'network_id': '6f07db7e-b66b-4896-a3cc-c1a2adc86a26'}}] _bind_port_level /usr/lib/python3.9/site-packages/neutron/plugins/ml2/managers.py:944
+ 4) Right after the Nova port binding request is received and the port is
+ bound: https://paste.opendev.org/show/bIUoJkiStCIe8TBb0573/
  
  
  This is basically the issue we have here: there is a race condition between (1) the Nova port binding request and (2) the "LogicalSwitchPortUpdateUpEvent" that is received when the OVS port is created on a chassis.
  
- Just testing, if I add a 1 second sleep at the very first line of
+ Just for testing, if I add a 1 second sleep at the very first line of
  "_port_provisioned", allowing to receive the Nova port binding request
  (that will bind the port to a host), the port provisioning succeeds and
  the port is set to UP. I'll find a way to fix that in the Ml2Plugin
  code.

** Description changed:

  Related Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2120409
+ 
+ Summary: after the a live-migration, the VM port status is DOWN.
  
  During a live-migration, the following events happen in the Neutron server:
  1) We receive a port update. Because the "migrating_to" field is in the port binding, the OVN mech driver forces a port update from DOWN to UP. This (1) sets the port status to UP and (2) sends the vif-plugged event to Nova. That will trigger the port creation (layer 1) in the destination node.
  
  2) Then the "LogicalSwitchPortUpdateDownEvent", because the source port
  was deleted. That sets the port status to DOWN.
  
  3) At the same time we receive the "LogicalSwitchPortUpdateUpEvent",
  because the port in the destination host has been created. This last
  event won't manually set the port status to UP. Instead it will remove
  any port provisioning block [1].
  
  3.1) If the port provisioned is considered as complete
  ("provisioning_complete" event), this is processed in
  "Ml2Plugin._port_provisioned". The problem we are hitting here is that
  the port has no host (the port is still not bound):
  
  2022-08-26 10:08:23.373 17 DEBUG neutron.plugins.ml2.plugin
  [req-2b13d263-5748-46e2-9fdf-33df50634607 - - - - -] Port
  943db0db-773f-45e9-8b68-0ebcc1840207 cannot update to ACTIVE because it
  is not bound. _port_provisioned /usr/lib/python3.9/site-
  packages/neutron/plugins/ml2/plugin.py:339
  
  4) Right after the Nova port binding request is received and the port is
  bound: https://paste.opendev.org/show/bIUoJkiStCIe8TBb0573/
  
- 
- This is basically the issue we have here: there is a race condition between (1) the Nova port binding request and (2) the "LogicalSwitchPortUpdateUpEvent" that is received when the OVS port is created on a chassis.
+ This is basically the issue we have here: there is a race condition
+ between (1) the Nova port binding request and (2) the
+ "LogicalSwitchPortUpdateUpEvent" that is received when the OVS port is
+ created on a chassis.
  
  Just for testing, if I add a 1 second sleep at the very first line of
  "_port_provisioned", allowing to receive the Nova port binding request
  (that will bind the port to a host), the port provisioning succeeds and
  the port is set to UP. I'll find a way to fix that in the Ml2Plugin
  code.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1988199

Title:
  [OVN][live-migration] Nova port binding request and
  "LogicalSwitchPortUpdateUpEvent" race condition

Status in neutron:
  New

Bug description:
  Related Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2120409

  Summary: after the a live-migration, the VM port status is DOWN.

  During a live-migration, the following events happen in the Neutron server:
  1) We receive a port update. Because the "migrating_to" field is in the port binding, the OVN mech driver forces a port update from DOWN to UP. This (1) sets the port status to UP and (2) sends the vif-plugged event to Nova. That will trigger the port creation (layer 1) in the destination node.

  2) Then the "LogicalSwitchPortUpdateDownEvent", because the source
  port was deleted. That sets the port status to DOWN.

  3) At the same time we receive the "LogicalSwitchPortUpdateUpEvent",
  because the port in the destination host has been created. This last
  event won't manually set the port status to UP. Instead it will remove
  any port provisioning block [1].

  3.1) If the port provisioned is considered as complete
  ("provisioning_complete" event), this is processed in
  "Ml2Plugin._port_provisioned". The problem we are hitting here is that
  the port has no host (the port is still not bound):

  2022-08-26 10:08:23.373 17 DEBUG neutron.plugins.ml2.plugin
  [req-2b13d263-5748-46e2-9fdf-33df50634607 - - - - -] Port
  943db0db-773f-45e9-8b68-0ebcc1840207 cannot update to ACTIVE because
  it is not bound. _port_provisioned /usr/lib/python3.9/site-
  packages/neutron/plugins/ml2/plugin.py:339

  4) Right after the Nova port binding request is received and the port
  is bound: https://paste.opendev.org/show/bIUoJkiStCIe8TBb0573/

  This is basically the issue we have here: there is a race condition
  between (1) the Nova port binding request and (2) the
  "LogicalSwitchPortUpdateUpEvent" that is received when the OVS port is
  created on a chassis.

  Just for testing, if I add a 1 second sleep at the very first line of
  "_port_provisioned", allowing to receive the Nova port binding request
  (that will bind the port to a host), the port provisioning succeeds
  and the port is set to UP. I'll find a way to fix that in the
  Ml2Plugin code.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1988199/+subscriptions



Follow ups