← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2073254] Re: nova doesn't wait for vif-plugged in ovn usecase

 

Rewriting the bug (as we have a new idea how to solve that) and so
setting the status back to New.

** Description changed:

  Description
  ===========
- At environments using OVN as network backend, nova seems to not wait for network-vif-plugged event.
- This leads to issues in bigger environments, where ovn northd needs some time to sync or some ovn component is down. Nova than starts the libvirt migration, but network setup is not done.
+ At environments using OVN as network backend, nova doesn't wait for network-vif-plugged event (As the events wasn't reliable.)
+ This leads to issues in bigger environments, where ovn northd needs some time to sync or some ovn component is down. Nova than starts the libvirt migration, but network setup is not done. That breaks live-migration of smaller VMs (less RAM - faster copied).
  
- Neutron is using the multi-chassis approach of ovn to send packages to
- both, source and destination host. Nova should wait for Neutron to have
- this configured.
+ Neutron is using the multi-chassis approach of ovn to send packages to both, source and destination host. Also newer OVN versions perform much better regarding sending the events.
+ Nova should have an option to wait for OVN to have this configured.
+ Without that, live-migration breaks connectivity of VMs.
  
- In nova code I found that get_live_migration_plug_time_events()[1] calls (via has_live_migration_plug_time_event()) is_hybrid_plug_enabled()[2], where it checks if VIF_DETAILS_OVS_HYBRID_PLUG is set in VIF.details.
- But that's not the case for ports in OVN setup. Instead we should check, if the ports driver is ovn.
- Also while debugging at my cluster with OVN setup, I saw nova-compute passing that function.
+ I propose to add an option to let nova-compute create the tap interface, instead creating it by libvirt. This helps as we see at OVN southbound and at local ovs Interface, that the interface is claimed, but that's only done, as soon the interface is on the host. When libvirt creates the interface, this will happen at creating the libvirt domain and directly before starting the migration. So there is no possibility to wait for neutron.
+ Instead when nova-compute creates the interface, we can wait till OVN has the change propagated and implemented and start migration afterwards.
+ 
+ We should check that at nova-compute, as it has all the needed
+ information and going the way over neutron and network-vif-plugged event
+ is a much bigger change. Also the behavior at nova will be optional
+ (config option), but neutron needs to rely on this behavior, to send the
+ event at the correct time.
  
  Steps to reproduce
  ==================
  1. Run neutron with ovn setup and create a VM that you can ping (via FIP or other VM in same private network)
  2. Stop northd
  3. Start live-migration
  4. Wait till live-migration is done - VM is not reachable anymore
  
- or patch neutron to not send any network-vif-plugged events and do the
- same steps (besides 2.)
  
  Expected result
  ===============
- nova waits for network-vif-plugged event from neutron
+ nova-compute can create the interface so OVN has it claimed before the actual migration starts
+ nova-compute can wait for network backend to claim the interface.
  
  Actual result
  =============
  libvirt migration is directly started
  
  Environment
  ===========
- nova-compute --version: 26.3.0
- neutron-server 21.2.1 zed / unmaintained/zed
+ nova-compute --version: 30.1.0 / master
+ neutron-server 26.0.0.0b3.dev182 / master
  ml2 plugin: ovn
  at neutron: ovsdb-client (Open vSwitch) 3.3.0
- Nova zed / unmaintained/zed
+ Nova 30.1.0 / master
  nova.conf: live_migration_wait_for_vif_plug=true ([3])
- Hypervisor OS: Ubuntu 22.04 with newer kernel (but that shouldn't be relevant here)
- Libvirt + KVM
+ Hypervisor OS: Ubuntu 24.04.1
+ libvirtd 10.0.0 + kvm 8.2.2
  
  Proposed Change
  ===============
- As fix I'm testing an additional function at [1] that checks, if vif/port driver is ovn, than get_live_migration_plug_time_events should return True.
- 
- Related Bugs
- ============
- neutron: https://bugs.launchpad.net/neutron/+bug/2069718
- Comments from neutron patch[4] say, this should be fixed at nova (as setting OVS_HYBRID_PLUG for ovn is wrong)
- 
- Follow up will be to change neutron to send the event at the correct
- time, but first nova needs to wait for it.
+ Add config option to let create interface by nova-compute and libvirt use it
+ Based on that option wait for ovs interface to get claimed/installed by ovn-controller.
  
  
- [1] https://opendev.org/openstack/nova/src/branch/master/nova/network/model.py#L563
- [2] https://opendev.org/openstack/nova/src/branch/master/nova/network/model.py#L499
  [3] https://docs.openstack.org/nova/latest/configuration/config.html#compute.live_migration_wait_for_vif_plug
- [4] https://review.opendev.org/c/openstack/neutron/+/923962

** Summary changed:

- nova doesn't wait for vif-plugged in ovn usecase
+ nova-compute optional wait for vif-plugged in ovn usecase

** Changed in: nova
       Status: Opinion => New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2073254

Title:
  nova-compute optional wait for vif-plugged in ovn usecase

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  At environments using OVN as network backend, nova doesn't wait for network-vif-plugged event (As the events wasn't reliable.)
  This leads to issues in bigger environments, where ovn northd needs some time to sync or some ovn component is down. Nova than starts the libvirt migration, but network setup is not done. That breaks live-migration of smaller VMs (less RAM - faster copied).

  Neutron is using the multi-chassis approach of ovn to send packages to both, source and destination host. Also newer OVN versions perform much better regarding sending the events.
  Nova should have an option to wait for OVN to have this configured.
  Without that, live-migration breaks connectivity of VMs.

  I propose to add an option to let nova-compute create the tap interface, instead creating it by libvirt. This helps as we see at OVN southbound and at local ovs Interface, that the interface is claimed, but that's only done, as soon the interface is on the host. When libvirt creates the interface, this will happen at creating the libvirt domain and directly before starting the migration. So there is no possibility to wait for neutron.
  Instead when nova-compute creates the interface, we can wait till OVN has the change propagated and implemented and start migration afterwards.

  We should check that at nova-compute, as it has all the needed
  information and going the way over neutron and network-vif-plugged
  event is a much bigger change. Also the behavior at nova will be
  optional (config option), but neutron needs to rely on this behavior,
  to send the event at the correct time.

  Steps to reproduce
  ==================
  1. Run neutron with ovn setup and create a VM that you can ping (via FIP or other VM in same private network)
  2. Stop northd
  3. Start live-migration
  4. Wait till live-migration is done - VM is not reachable anymore

  
  Expected result
  ===============
  nova-compute can create the interface so OVN has it claimed before the actual migration starts
  nova-compute can wait for network backend to claim the interface.

  Actual result
  =============
  libvirt migration is directly started

  Environment
  ===========
  nova-compute --version: 30.1.0 / master
  neutron-server 26.0.0.0b3.dev182 / master
  ml2 plugin: ovn
  at neutron: ovsdb-client (Open vSwitch) 3.3.0
  Nova 30.1.0 / master
  nova.conf: live_migration_wait_for_vif_plug=true ([3])
  Hypervisor OS: Ubuntu 24.04.1
  libvirtd 10.0.0 + kvm 8.2.2

  Proposed Change
  ===============
  Add config option to let create interface by nova-compute and libvirt use it
  Based on that option wait for ovs interface to get claimed/installed by ovn-controller.

  
  [3] https://docs.openstack.org/nova/latest/configuration/config.html#compute.live_migration_wait_for_vif_plug

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2073254/+subscriptions



References