← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2069718] [NEW] [ovn] No connection to VM during live-migration

 

Public bug reported:

Problem: In environments with many hypervisors and VMs, a live-migration
leads to VMs being not reachable for some seconds (4-20s).

Description:
We run a big environment with many hypervisors and VMs, so northd reconcile cycles take some time.
At live-migration, even nova has live_migration_wait_for_vif_plug=true configured, the vif-plugged event from neutron is send before northd has processed the change to have the VMs port added to the destination hypervisor and multi-chassis-feature is enabled.
Nova starts the live migration at libvirt and it is done, before southbound and ovn-controller of destination have the change.
So the VM is started at destination hypervisor but the port setup is not done yet.

>From what I saw, the vif-plugged event is generated by neutron, when the
transaction to northbound ovsdb is finished [1].

Is there a way to wait till the change is propagated to southbound
ovsdb?

Version:
neutron-server 21.2.1 zed / unmaintained/zed
ml2 plugin: ovn
at neutron: ovsdb-client (Open vSwitch) 3.3.0
Nova zed / unmaintained/zed
nova.conf: live_migration_wait_for_vif_plug=true ([2])
Hypervisor OS: Ubuntu 22.04 with newer kernel (but that shouldn't be relevant here)

Steps to Reproduce:

1. Run neutron with ovn setup and create a VM that you can ping (via FIP or other VM in same private network)
2. Stop northd
3. Start live-migration
4. Wait till live-migration is done - VM is not reachable anymore

[1] https://opendev.org/openstack/neutron/src/branch/unmaintained/zed/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L836
[2] https://docs.openstack.org/nova/latest/configuration/config.html#compute.live_migration_wait_for_vif_plug

** Affects: neutron
     Importance: Undecided
         Status: New

** Description changed:

  Problem: In environments with many hypervisors and VMs, a live-migration
  leads to VMs being not reachable for some seconds (4-20s).
  
  Description:
  We run a big environment with many hypervisors and VMs, so northd reconcile cycles take some time.
  At live-migration, even nova has live_migration_wait_for_vif_plug=true configured, the vif-plugged event from neutron is send before northd has processed the change to have the VMs port added to the destination hypervisor and multi-chassis-feature is enabled.
  Nova starts the live migration at libvirt and it is done, before southbound and ovn-controller of destination have the change.
  So the VM is started at destination hypervisor but the port setup is not done yet.
  
  From what I saw, the vif-plugged event is generated by neutron, when the
  transaction to northbound ovsdb is finished [1].
  
  Is there a way to wait till the change is propagated to southbound
  ovsdb?
  
  Version:
  neutron-server 21.2.1 zed / unmaintained/zed
  ml2 plugin: ovn
  at neutron: ovsdb-client (Open vSwitch) 3.3.0
  Nova zed / unmaintained/zed
- nova.conf: live_migration_wait_for_vif_plug=true (https://docs.openstack.org/nova/latest/configuration/config.html#compute.live_migration_wait_for_vif_plug)
+ nova.conf: live_migration_wait_for_vif_plug=true ([2])
  Hypervisor OS: Ubuntu 22.04 with newer kernel (but that shouldn't be relevant here)
- 
  
  Steps to Reproduce:
  
  1. Run neutron with ovn setup and create a VM that you can ping (via FIP or other VM in same private network)
  2. Stop northd
  3. Start live-migration
  4. Wait till live-migration is done - VM is not reachable anymore
  
- 
  [1] https://opendev.org/openstack/neutron/src/branch/unmaintained/zed/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L836
+ [2] https://docs.openstack.org/nova/latest/configuration/config.html#compute.live_migration_wait_for_vif_plug

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2069718

Title:
  [ovn] No connection to VM during live-migration

Status in neutron:
  New

Bug description:
  Problem: In environments with many hypervisors and VMs, a live-
  migration leads to VMs being not reachable for some seconds (4-20s).

  Description:
  We run a big environment with many hypervisors and VMs, so northd reconcile cycles take some time.
  At live-migration, even nova has live_migration_wait_for_vif_plug=true configured, the vif-plugged event from neutron is send before northd has processed the change to have the VMs port added to the destination hypervisor and multi-chassis-feature is enabled.
  Nova starts the live migration at libvirt and it is done, before southbound and ovn-controller of destination have the change.
  So the VM is started at destination hypervisor but the port setup is not done yet.

  From what I saw, the vif-plugged event is generated by neutron, when
  the transaction to northbound ovsdb is finished [1].

  Is there a way to wait till the change is propagated to southbound
  ovsdb?

  Version:
  neutron-server 21.2.1 zed / unmaintained/zed
  ml2 plugin: ovn
  at neutron: ovsdb-client (Open vSwitch) 3.3.0
  Nova zed / unmaintained/zed
  nova.conf: live_migration_wait_for_vif_plug=true ([2])
  Hypervisor OS: Ubuntu 22.04 with newer kernel (but that shouldn't be relevant here)

  Steps to Reproduce:

  1. Run neutron with ovn setup and create a VM that you can ping (via FIP or other VM in same private network)
  2. Stop northd
  3. Start live-migration
  4. Wait till live-migration is done - VM is not reachable anymore

  [1] https://opendev.org/openstack/neutron/src/branch/unmaintained/zed/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L836
  [2] https://docs.openstack.org/nova/latest/configuration/config.html#compute.live_migration_wait_for_vif_plug

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2069718/+subscriptions