yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1282956] Re: l2-population : hard reboot a VM after a compute crash

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Mathieu Rohon <mathieu.rohon@xxxxxxxxx>
Date: Tue, 25 Feb 2014 17:14:02 -0000
Reply-to: Bug 1282956 <1282956@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

aaron,

fdb entries are forwarding information on the bridge of the host, and the ip neighbouring entries stands for the ARP responder entries.
I have one network node and two compute nodes

I first create a VM with IP 10.0.0.104 and MAC 00:00:00:44:44:44 on
node1, then on network node I have :

# ip neigh show
10.0.0.104 dev vxlan-1001 lladdr 00:00:00:44:44:44 PERMANENT
# bridge fdb show dev vxlan-1001
00:00:00:00:00:00 dst 192.168.254.74 self permanent
00:00:00:44:44:44 dst 192.168.254.74 self permanent

then, I change the binding of the VM in nova database : 
mysql> update instances set host = 'node2' where host = 'node1' and deleted = 0;

then i do the reboot:
# nova reboot --hard uuid

Actually, this doesn't seem to be an l2-population only bug, since nova doesn't send the update_port with the new host.
the result is that the agent send "get_device_details" which move the port to the status Build. Then the agent set update_device_up which is not forwarded to the MD since the port is not bound to the agent which sends update_device_up.
here is the log of the plugin : 
2014-02-25 18:01:32.424 28952 DEBUG neutron.plugins.ml2.rpc [req-3fd91aee-67b2-4fd7-b0e1-d3afb91ef7f6 None] De
vice tap147edc0d-44 up at agent lb00163ef452ac update_device_up /opt/stack/neutron/neutron/plugins/ml2/rpc.py:
186
2014-02-25 18:01:32.427 28952 DEBUG neutron.plugins.ml2.rpc [req-3fd91aee-67b2-4fd7-b0e1-d3afb91ef7f6 None] Device tap147edc0d-44 not bound to the agent host devstack2 update_device_up /opt/stack/neutron/neutron/plugins/ml2/rpc.py:192

in the ML2 db, the port is still bound to node1, and the port status is
still BUILD.





** Also affects: nova
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1282956

Title:
  l2-population : hard reboot a VM after a compute crash

Status in OpenStack Neutron (virtual network service):
  Incomplete
Status in OpenStack Compute (Nova):
  New

Bug description:
  I run in multi node setup with ML2, L2-population and Linuxbridge MD,
  and vxlan TypeDriver.

  I start two compute-nodes, I launch a VM, and I shutdown the compute-
  node which host the VM.

  I use this process to relaunch the VM on the other compute-node :

  http://docs.openstack.org/trunk/openstack-
  ops/content/maintenance.html#totle_compute_node_failure

  Once the VM is launched on the other compute node, fdb entries and
  neighbouring entries are no more populated on the network-node nor on
  the compute node

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1282956/+subscriptions

References

[Bug 1282956] [NEW] l2-population : hard reboot a VM after a compute crash
From: Mathieu Rohon, 2014-02-21