← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1414559] Re: OVS drops RARP packets by QEMU upon live-migration - VM temporarily disconnected

 

Reviewed:  https://review.openstack.org/497457
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8e6d5d404cf49e5b68b43c62e7f6d7db2771a1f4
Submitter: Zuul
Branch:    master

commit 8e6d5d404cf49e5b68b43c62e7f6d7db2771a1f4
Author: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@xxxxxxxxxx>
Date:   Thu Aug 24 09:13:09 2017 -0400

    libvirt: slow live-migration to ensure network is ready
    
    In Neutron, commit b7c303ee0a16a05c1fdb476dc7f4c7ca623a3f58 introduced
    events sent during a live migration when the VIFs are plugged on
    destination node.
    
    The Linux bridge agent mechanism driver is detecting new networks on
    the destination host only when the TAP devices are created, and these
    tap devices are only created when libvirt starts the migration. As a
    result, we must actually start the migration and then slow it as we
    wait for the neutron events.
    
    This change ensures we wait for these events.
    
    Depends-On: Icb039ae2d465e3822ab07ae4f9bc405c1362afba
    
    Closes-Bug: #1414559
    Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@xxxxxxxxxx>
    Change-Id: I407034374fe17c4795762aa32575ba72d3a46fe8


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1414559

Title:
  OVS drops RARP packets by QEMU upon live-migration - VM temporarily
  disconnected

Status in neutron:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  When live-migrating a VM the QEMU send 5 RARP packets in order to allow re-learning of the new location of the VM's MAC address.
  However the VIF creation scheme between nova-compute and neutron-ovs-agent drops these RARPs:
  1. nova creates a port on OVS but without the internal tagging. 
  2. At this stage all the packets that come out from the VM, or QEMU process it runs in, will be dropped.
  3. The QEMU sends five RARP packets in order to allow MAC learning. These packets are dropped as described in #2.
  4. In the meanwhile neutron-ovs-agent loops every POLLING_INTERVAL and scans for new ports. Once it detects a new port is added. it will read the properties of the new port, and assign the correct internal tag, that will allow connection of the VM.

  The flow above suggests that:
  1. RARP packets are dropped, so MAC learning takes much longer and depends on internal traffic and advertising by the VM.
  2. VM is disconnected from the network for a mean period of POLLING_INTERVAL/2

  Seems like this could be solved by direct messages between nova vif
  driver and neutron-ovs-agent

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1414559/+subscriptions


References