← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1671379] [NEW] The first VM of one network in one compute node cannot send RARP packets during KVM's live-migration in a neutron ML2 hierachical port binding environment whose second mechanism driver was configured as the existing OVS driver "openvswitch"

 

Public bug reported:

Description
===========
Normaly, VM which migrates to destination node can send several RARP packets during KVM's live-migration in my openstack environment.
In neutron ML2 hierarchical port binding environment,
I find that the physical port associated to a vlan physical provider's ovs bridge on destination node cannot dump any rarp packets when VM migrates to destination node.


Steps to reproduce
==================
1. create a vxlan type network:   netA    
2. create a subnet for netA:      subA
3. create a vm in compute1 node:  vmA
4. tcpdump the physical port associated to a ovs bridge in compute2 node:  tcpdump -i ens33 -w ens33.pcap 
5. live migrate the vm to the other compute node: compute2 node 
6. open ens33.pcap in wireshark


Expected result
===============
find several rarp packets 


Actual result
=============
find not any rarp packets


Environment
===========
OpenStack:Kilo  2015.1.2
OS: CentOS 7.1.1503
Libvirt:1.2.17


Logs & Configs
==============
hierarchical port binding configuration:
controller node: 
#neutron   
/etc/neutron/plugins/ml2/ml2_conf.ini
[ml2]
type_drivers = vxlan,vlan
tenant_network_types = vxlan,vlan
mechanism_drivers=ml2_h3c,openvswitch 
#ml2_h3c, a mechanism driver owned by New H3C Group which is a provider of New IT solutions , allocates dynamic
#vlan segment for the existing mechanism driver "openvswitch"  

[ml2_type_vlan]
network_vlan_ranges = compute1_physicnet1:100:1000, compute2_physicnet1:100:1000
[ml2_type_vxlan]
vni_ranges=1:500


compute1 node:
#neutron   
/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
[ovs]
bridge_mappings=compute1_physicnet1:br-ens33


compute2 node:
#neutron   
/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
[ovs]
bridge_mappings=compute2_physicnet1:br-ens33



Analysis
==============
After reading the live-migration relevant code of nova, neutron-server and neutron-openvswitch-agent, I think that it may be a bug.

The brief relevant process:

1. source compute node(nova-compute)  compute1 node
   self.driver(libvirt).live_migration
         dom.migrateToURI2 ---------------Excecute migration to dest node
         self._live_migration_monitor------------------ Monitor migration finished
             self._post_live_migration ---------------- Migration finished
                 self.compute_rpcapi.post_live_migration_at_destination  --------- Notify destination node


2.1. destination compute node (neutron-openvswitch-agent)   compute2 node
   rpc_loop   ------ monitor vm's tapxxxx port plug     
      self.process_network_ports
         self.treat_devices_added_or_updated
              self.plugin_rpc.get_devices_details_list  -------The port details shows that the port still is bound to 
                                                               "compute1_physicnet1", not the physical network
                                                                provider "compute2_physicnet1" existing in
                                                               destination compute node.
              self.treat_vif_port
                  self.port_bound
                      self.provision_local_vlan -----------  There is not matched physical bridge at the time. As a                
                                                             result, the tap port can not been set any vlan tag. 
                                                             Eventually, br-ens33, the physical bridge, drops rarp 
                                                             packets from the starting vm. 
    
   

2.2 destination compute node (nova-compute)   compute2 node
    post_live_migration_at_destination   nova/compute/manager.py
        self.network_api.migrate_instance_finish
            self._update_port_binding_for_instance ------------Notify neutron migrate port binding:host_id


3. controller node(neutron-server)   
   ml2_h3c: fill self._new_bound_segment and self._next_segments_to_bind with compute2_physicnet1
            for openvswitch driver 
   openvswitch: bind port with compute2_physicnet1's allocated segment from level 0 driver ml2_h3c


In the current process of kilo, ml2 driver finishes port bind at the last step 3. 
it's too late to make neutron-openvswitch-agent get suitable port details from neutron-server 
to set correct vlan tag for vm port and adds relevant flow for ovs bridges that nova  notifies neutron-server the 
event that port changes binding_hostid in ml2 hierarchical port binding.

It seems that liberty, mitaka exists the same problem.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1671379

Title:
  The first VM of one network in one compute node cannot send RARP
  packets during KVM's live-migration in a neutron ML2 hierachical port
  binding environment whose second mechanism driver was configured as
  the existing OVS driver  "openvswitch"

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  Normaly, VM which migrates to destination node can send several RARP packets during KVM's live-migration in my openstack environment.
  In neutron ML2 hierarchical port binding environment,
  I find that the physical port associated to a vlan physical provider's ovs bridge on destination node cannot dump any rarp packets when VM migrates to destination node.


  Steps to reproduce
  ==================
  1. create a vxlan type network:   netA    
  2. create a subnet for netA:      subA
  3. create a vm in compute1 node:  vmA
  4. tcpdump the physical port associated to a ovs bridge in compute2 node:  tcpdump -i ens33 -w ens33.pcap 
  5. live migrate the vm to the other compute node: compute2 node 
  6. open ens33.pcap in wireshark


  
  Expected result
  ===============
  find several rarp packets 



  Actual result
  =============
  find not any rarp packets

  
  Environment
  ===========
  OpenStack:Kilo  2015.1.2
  OS: CentOS 7.1.1503
  Libvirt:1.2.17


  
  Logs & Configs
  ==============
  hierarchical port binding configuration:
  controller node: 
  #neutron   
  /etc/neutron/plugins/ml2/ml2_conf.ini
  [ml2]
  type_drivers = vxlan,vlan
  tenant_network_types = vxlan,vlan
  mechanism_drivers=ml2_h3c,openvswitch 
  #ml2_h3c, a mechanism driver owned by New H3C Group which is a provider of New IT solutions , allocates dynamic
  #vlan segment for the existing mechanism driver "openvswitch"  

  [ml2_type_vlan]
  network_vlan_ranges = compute1_physicnet1:100:1000, compute2_physicnet1:100:1000
  [ml2_type_vxlan]
  vni_ranges=1:500


  compute1 node:
  #neutron   
  /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
  [ovs]
  bridge_mappings=compute1_physicnet1:br-ens33

  
  compute2 node:
  #neutron   
  /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
  [ovs]
  bridge_mappings=compute2_physicnet1:br-ens33



  
  Analysis
  ==============
  After reading the live-migration relevant code of nova, neutron-server and neutron-openvswitch-agent, I think that it may be a bug.

  The brief relevant process:

  1. source compute node(nova-compute)  compute1 node
     self.driver(libvirt).live_migration
           dom.migrateToURI2 ---------------Excecute migration to dest node
           self._live_migration_monitor------------------ Monitor migration finished
               self._post_live_migration ---------------- Migration finished
                   self.compute_rpcapi.post_live_migration_at_destination  --------- Notify destination node

  
  2.1. destination compute node (neutron-openvswitch-agent)   compute2 node
     rpc_loop   ------ monitor vm's tapxxxx port plug     
        self.process_network_ports
           self.treat_devices_added_or_updated
                self.plugin_rpc.get_devices_details_list  -------The port details shows that the port still is bound to 
                                                                 "compute1_physicnet1", not the physical network
                                                                  provider "compute2_physicnet1" existing in
                                                                 destination compute node.
                self.treat_vif_port
                    self.port_bound
                        self.provision_local_vlan -----------  There is not matched physical bridge at the time. As a                
                                                               result, the tap port can not been set any vlan tag. 
                                                               Eventually, br-ens33, the physical bridge, drops rarp 
                                                               packets from the starting vm. 
      
     

  2.2 destination compute node (nova-compute)   compute2 node
      post_live_migration_at_destination   nova/compute/manager.py
          self.network_api.migrate_instance_finish
              self._update_port_binding_for_instance ------------Notify neutron migrate port binding:host_id


  
  3. controller node(neutron-server)   
     ml2_h3c: fill self._new_bound_segment and self._next_segments_to_bind with compute2_physicnet1
              for openvswitch driver 
     openvswitch: bind port with compute2_physicnet1's allocated segment from level 0 driver ml2_h3c

  
  In the current process of kilo, ml2 driver finishes port bind at the last step 3. 
  it's too late to make neutron-openvswitch-agent get suitable port details from neutron-server 
  to set correct vlan tag for vm port and adds relevant flow for ovs bridges that nova  notifies neutron-server the 
  event that port changes binding_hostid in ml2 hierarchical port binding.

  It seems that liberty, mitaka exists the same problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1671379/+subscriptions


Follow ups