yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #87392
[Bug 1946729] [NEW] libvirt virt driver does not wait for network-vif-plugged event during hard reboot
Public bug reported:
The libvirt virt driver has a logic during spawn to create the domain in
libvirt, the pause it, then only resume it after the network-vif-plugged
events are received from neutron for the ports of the instance being
spawned. This is in place to avoid starting the guest OS before the
networking backend can finish set up the networking for the ports.
Without this a guest might start and request IP via DHCP before the
networking setup is finished and therefore might not get IP at all.
In case of hard reboot (and start as that is a hard reboot too) nova
cleans up the instance from the hypervisor (except the local disk)
including unplugging the vifs of the instance. Then nova recreate
everything including re-plugging the vifs. This is intentional as hard
reboot is considered to be an operation that is capable of recovering
instances in bad / inconsistent states. However during the hard reboot
nova does not wait for the nework-vif-plugged events before it let the
domain start running. In a mass instance startup scenario (e.g. after a
compute host recovery) there is potentially a lot of vif unplug/plug
hits the networking backend. Processing these replugs takes time. Nova
does not wait for the network-vif-plugged event, so the guest OS can
start the DHCP request a way before the networking backend can catch up
with the unplug/plug request. This leads to connectivity issues in the
guest.
** Affects: nova
Importance: Medium
Assignee: Balazs Gibizer (balazs-gibizer)
Status: In Progress
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1946729
Title:
libvirt virt driver does not wait for network-vif-plugged event during
hard reboot
Status in OpenStack Compute (nova):
In Progress
Bug description:
The libvirt virt driver has a logic during spawn to create the domain
in libvirt, the pause it, then only resume it after the network-vif-
plugged events are received from neutron for the ports of the instance
being spawned. This is in place to avoid starting the guest OS before
the networking backend can finish set up the networking for the ports.
Without this a guest might start and request IP via DHCP before the
networking setup is finished and therefore might not get IP at all.
In case of hard reboot (and start as that is a hard reboot too) nova
cleans up the instance from the hypervisor (except the local disk)
including unplugging the vifs of the instance. Then nova recreate
everything including re-plugging the vifs. This is intentional as hard
reboot is considered to be an operation that is capable of recovering
instances in bad / inconsistent states. However during the hard reboot
nova does not wait for the nework-vif-plugged events before it let the
domain start running. In a mass instance startup scenario (e.g. after
a compute host recovery) there is potentially a lot of vif unplug/plug
hits the networking backend. Processing these replugs takes time. Nova
does not wait for the network-vif-plugged event, so the guest OS can
start the DHCP request a way before the networking backend can catch
up with the unplug/plug request. This leads to connectivity issues in
the guest.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1946729/+subscriptions
Follow ups