yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #62138
[Bug 1671379] [NEW] The first VM of one network in one compute node cannot send RARP packets during KVM's live-migration in a neutron ML2 hierachical port binding environment whose second mechanism driver was configured as the existing OVS driver "openvswitch"
Public bug reported:
Description
===========
Normaly, VM which migrates to destination node can send several RARP packets during KVM's live-migration in my openstack environment.
In neutron ML2 hierarchical port binding environment,
I find that the physical port associated to a vlan physical provider's ovs bridge on destination node cannot dump any rarp packets when VM migrates to destination node.
Steps to reproduce
==================
1. create a vxlan type network: netA
2. create a subnet for netA: subA
3. create a vm in compute1 node: vmA
4. tcpdump the physical port associated to a ovs bridge in compute2 node: tcpdump -i ens33 -w ens33.pcap
5. live migrate the vm to the other compute node: compute2 node
6. open ens33.pcap in wireshark
Expected result
===============
find several rarp packets
Actual result
=============
find not any rarp packets
Environment
===========
OpenStack:Kilo 2015.1.2
OS: CentOS 7.1.1503
Libvirt:1.2.17
Logs & Configs
==============
hierarchical port binding configuration:
controller node:
#neutron
/etc/neutron/plugins/ml2/ml2_conf.ini
[ml2]
type_drivers = vxlan,vlan
tenant_network_types = vxlan,vlan
mechanism_drivers=ml2_h3c,openvswitch
#ml2_h3c, a mechanism driver owned by New H3C Group which is a provider of New IT solutions , allocates dynamic
#vlan segment for the existing mechanism driver "openvswitch"
[ml2_type_vlan]
network_vlan_ranges = compute1_physicnet1:100:1000, compute2_physicnet1:100:1000
[ml2_type_vxlan]
vni_ranges=1:500
compute1 node:
#neutron
/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
[ovs]
bridge_mappings=compute1_physicnet1:br-ens33
compute2 node:
#neutron
/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
[ovs]
bridge_mappings=compute2_physicnet1:br-ens33
Analysis
==============
After reading the live-migration relevant code of nova, neutron-server and neutron-openvswitch-agent, I think that it may be a bug.
The brief relevant process:
1. source compute node(nova-compute) compute1 node
self.driver(libvirt).live_migration
dom.migrateToURI2 ---------------Excecute migration to dest node
self._live_migration_monitor------------------ Monitor migration finished
self._post_live_migration ---------------- Migration finished
self.compute_rpcapi.post_live_migration_at_destination --------- Notify destination node
2.1. destination compute node (neutron-openvswitch-agent) compute2 node
rpc_loop ------ monitor vm's tapxxxx port plug
self.process_network_ports
self.treat_devices_added_or_updated
self.plugin_rpc.get_devices_details_list -------The port details shows that the port still is bound to
"compute1_physicnet1", not the physical network
provider "compute2_physicnet1" existing in
destination compute node.
self.treat_vif_port
self.port_bound
self.provision_local_vlan ----------- There is not matched physical bridge at the time. As a
result, the tap port can not been set any vlan tag.
Eventually, br-ens33, the physical bridge, drops rarp
packets from the starting vm.
2.2 destination compute node (nova-compute) compute2 node
post_live_migration_at_destination nova/compute/manager.py
self.network_api.migrate_instance_finish
self._update_port_binding_for_instance ------------Notify neutron migrate port binding:host_id
3. controller node(neutron-server)
ml2_h3c: fill self._new_bound_segment and self._next_segments_to_bind with compute2_physicnet1
for openvswitch driver
openvswitch: bind port with compute2_physicnet1's allocated segment from level 0 driver ml2_h3c
In the current process of kilo, ml2 driver finishes port bind at the last step 3.
it's too late to make neutron-openvswitch-agent get suitable port details from neutron-server
to set correct vlan tag for vm port and adds relevant flow for ovs bridges that nova notifies neutron-server the
event that port changes binding_hostid in ml2 hierarchical port binding.
It seems that liberty, mitaka exists the same problem.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1671379
Title:
The first VM of one network in one compute node cannot send RARP
packets during KVM's live-migration in a neutron ML2 hierachical port
binding environment whose second mechanism driver was configured as
the existing OVS driver "openvswitch"
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
Normaly, VM which migrates to destination node can send several RARP packets during KVM's live-migration in my openstack environment.
In neutron ML2 hierarchical port binding environment,
I find that the physical port associated to a vlan physical provider's ovs bridge on destination node cannot dump any rarp packets when VM migrates to destination node.
Steps to reproduce
==================
1. create a vxlan type network: netA
2. create a subnet for netA: subA
3. create a vm in compute1 node: vmA
4. tcpdump the physical port associated to a ovs bridge in compute2 node: tcpdump -i ens33 -w ens33.pcap
5. live migrate the vm to the other compute node: compute2 node
6. open ens33.pcap in wireshark
Expected result
===============
find several rarp packets
Actual result
=============
find not any rarp packets
Environment
===========
OpenStack:Kilo 2015.1.2
OS: CentOS 7.1.1503
Libvirt:1.2.17
Logs & Configs
==============
hierarchical port binding configuration:
controller node:
#neutron
/etc/neutron/plugins/ml2/ml2_conf.ini
[ml2]
type_drivers = vxlan,vlan
tenant_network_types = vxlan,vlan
mechanism_drivers=ml2_h3c,openvswitch
#ml2_h3c, a mechanism driver owned by New H3C Group which is a provider of New IT solutions , allocates dynamic
#vlan segment for the existing mechanism driver "openvswitch"
[ml2_type_vlan]
network_vlan_ranges = compute1_physicnet1:100:1000, compute2_physicnet1:100:1000
[ml2_type_vxlan]
vni_ranges=1:500
compute1 node:
#neutron
/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
[ovs]
bridge_mappings=compute1_physicnet1:br-ens33
compute2 node:
#neutron
/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
[ovs]
bridge_mappings=compute2_physicnet1:br-ens33
Analysis
==============
After reading the live-migration relevant code of nova, neutron-server and neutron-openvswitch-agent, I think that it may be a bug.
The brief relevant process:
1. source compute node(nova-compute) compute1 node
self.driver(libvirt).live_migration
dom.migrateToURI2 ---------------Excecute migration to dest node
self._live_migration_monitor------------------ Monitor migration finished
self._post_live_migration ---------------- Migration finished
self.compute_rpcapi.post_live_migration_at_destination --------- Notify destination node
2.1. destination compute node (neutron-openvswitch-agent) compute2 node
rpc_loop ------ monitor vm's tapxxxx port plug
self.process_network_ports
self.treat_devices_added_or_updated
self.plugin_rpc.get_devices_details_list -------The port details shows that the port still is bound to
"compute1_physicnet1", not the physical network
provider "compute2_physicnet1" existing in
destination compute node.
self.treat_vif_port
self.port_bound
self.provision_local_vlan ----------- There is not matched physical bridge at the time. As a
result, the tap port can not been set any vlan tag.
Eventually, br-ens33, the physical bridge, drops rarp
packets from the starting vm.
2.2 destination compute node (nova-compute) compute2 node
post_live_migration_at_destination nova/compute/manager.py
self.network_api.migrate_instance_finish
self._update_port_binding_for_instance ------------Notify neutron migrate port binding:host_id
3. controller node(neutron-server)
ml2_h3c: fill self._new_bound_segment and self._next_segments_to_bind with compute2_physicnet1
for openvswitch driver
openvswitch: bind port with compute2_physicnet1's allocated segment from level 0 driver ml2_h3c
In the current process of kilo, ml2 driver finishes port bind at the last step 3.
it's too late to make neutron-openvswitch-agent get suitable port details from neutron-server
to set correct vlan tag for vm port and adds relevant flow for ovs bridges that nova notifies neutron-server the
event that port changes binding_hostid in ml2 hierarchical port binding.
It seems that liberty, mitaka exists the same problem.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1671379/+subscriptions
Follow ups