← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1946729] Re: libvirt virt driver does not wait for network-vif-plugged event during hard reboot

 

Reviewed:  https://review.opendev.org/c/openstack/nova/+/813419
Committed: https://opendev.org/openstack/nova/commit/68c970ea9915a95f9828239006559b84e4ba2581
Submitter: "Zuul (22348)"
Branch:    master

commit 68c970ea9915a95f9828239006559b84e4ba2581
Author: Balazs Gibizer <balazs.gibizer@xxxxxxxx>
Date:   Mon Oct 11 14:41:37 2021 +0200

    Add a WA flag waiting for vif-plugged event during reboot
    
    The libvirt driver power on and hard reboot destroys the domain first
    and unplugs the vifs then recreate the domain and replug the vifs.
    However nova does not wait for the network-vif-plugged event before
    unpause the domain. This can cause that the domain starts running and
    requesting IP via DHCP before the networking backend finished plugging
    the vifs.
    
    So this patch adds a workaround config option to nova to wait for
    network-vif-plugged events during hard reboot the same way as nova waits
    for this event during new instance spawn.
    
    This logic cannot be enabled unconditionally as not all neutron
    networking backend sending plug time events to wait for. Also the logic
    needs to be vnic_type dependent as ml2/ovs and the in tree sriov backend
    often deployed together on the same compute. While ml2/ovs sends plug
    time event the sriov backend does not send it reliably. So the
    configuration is not just a boolean flag but a list of vnic_types
    instead. This way the waiting for the plug time event for a vif that is
    handled by ml2/ovs is possible while the instance has other vifs handled
    by the sriov backend where no event can be expected.
    
    Change-Id: Ie904d1513b5cf76d6d5f6877545e8eb378dd5499
    Closes-Bug: #1946729


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1946729

Title:
  libvirt virt driver does not wait for network-vif-plugged event during
  hard reboot

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  The libvirt virt driver has a logic during spawn to create the domain
  in libvirt, the pause it, then only resume it after the network-vif-
  plugged events are received from neutron for the ports of the instance
  being spawned. This is in place to avoid starting the guest OS before
  the networking backend can finish set up the networking for the ports.
  Without this a guest might start and request IP via DHCP before the
  networking setup is finished and therefore might not get IP at all.

  In case of hard reboot (and start as that is a hard reboot too) nova
  cleans up the instance from the hypervisor (except the local disk)
  including unplugging the vifs of the instance. Then nova recreate
  everything including re-plugging the vifs. This is intentional as hard
  reboot is considered to be an operation that is capable of recovering
  instances in bad / inconsistent states. However during the hard reboot
  nova does not wait for the nework-vif-plugged events before it let the
  domain start running. In a mass instance startup scenario (e.g. after
  a compute host recovery) there is potentially a lot of vif unplug/plug
  hits the networking backend. Processing these replugs takes time. Nova
  does not wait for the network-vif-plugged event, so the guest OS can
  start the DHCP request a way before the networking backend can catch
  up with the unplug/plug request. This leads to connectivity issues in
  the guest.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1946729/+subscriptions



References