yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #44158
[Bug 1519926] Re: L3-agent restart causes VM connectivity loss
Reviewed: https://review.openstack.org/254579
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8b7e5997dae54a03ad2850c43b7070bc00c90273
Submitter: Jenkins
Branch: master
commit 8b7e5997dae54a03ad2850c43b7070bc00c90273
Author: Hong Hui Xiao <xiaohhui@xxxxxxxxxx>
Date: Tue Dec 8 01:17:54 2015 -0500
Separate the command for replace_port to delete and add
When a port has been added to router namespace, trying to replace the
port by adding del-port and add-port in one command, will not bring
the new port to kernel. Even if the port is updated in ovs db and can
be found on br-int, system can't see the new port. This will break
the following actions, which will manipulate the new port by ip
commands. A mail list has been filed to discuss this issue at [1].
The problem here will break the scenario that namespace is deleted
unexpectedly, and l3-agent tries to rebuild the namespace at restart.
Separating replace_port to two commands: del-port and add-port,
matches the original logic and has been verified that it can resolve
the problem here.
[1] http://openvswitch.org/pipermail/discuss/2015-December/019667.html
Change-Id: If36bcf5a0cccb667f3087aea1e9ea9f20eb3a563
Closes-Bug: #1519926
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1519926
Title:
L3-agent restart causes VM connectivity loss
Status in neutron:
Fix Released
Bug description:
L3-agent restart causes VM connectivity loss
To test whether a the L3-agent on a network node can recover after a
it was stopped and then restarted. I ran this test on a devstack setup
using the latest neutron code on the master branch. The L3-agent is
running in legacy mode.
1. Create a network, subnetwork.
2. Create a router, tie the router to the subnetwork and the external network.
3. Create a VM using the network and assign a floating IP to the VM. The VM can be pinged and ssh'ed using the floating IP.
4. On the controller node, kill the L3 agent.
5. Delete the qrouter namespace of the router created in (2) on the controller node.
6. Start up the L3-agent again.
7. Now the VM can no longer be ssh'ed using the FIP.
The VM connectivity is lost to the VM because the L3-agent failed to
reconstruct all the interfaces in the qrouter namespace. For example:
Before running steps 4-6, the qrouter namespace on the controller node looks like (router-id=e86b277a-5f49-4fcb-8d85-241594db418e, VM's FIP=10.127.10.5):
stack@Ubuntu-38:~/DEVSTACK/demo$ sudo ip netns exec qrouter-e86b277a-5f49-4fcb-8d85-241594db418e ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
33: qr-50b99abf-a4: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether fa:16:3e:17:3e:b0 brd ff:ff:ff:ff:ff:ff
inet 10.1.2.1/24 brd 10.1.2.255 scope global qr-50b99abf-a4
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe17:3eb0/64 scope link
valid_lft forever preferred_lft forever
34: qg-3d1a888a-33: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether fa:16:3e:60:9a:43 brd ff:ff:ff:ff:ff:ff
inet 10.127.10.4/24 brd 10.127.10.255 scope global qg-3d1a888a-33
valid_lft forever preferred_lft forever
inet 10.127.10.5/32 brd 10.127.10.5 scope global qg-3d1a888a-33
valid_lft forever preferred_lft forever
inet6 2001:db8::3/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe60:9a43/64 scope link
valid_lft forever preferred_lft forever
After deleting the qrouter-e86b277a-5f49-4fcb-8d85-241594db418e
namespace and then restarting the L3-agent on the controller node, the
L3-agent did recreate the namespace again, however, not all the
interfaces and IP addresses are created:
stack@Ubuntu-38:~/DEVSTACK/demo$ sudo ip netns exec qrouter-e86b277a-5f49-4fcb-8d85-241594db418e ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
So the VM can't be ssh'ed because all the required plumbing is not re-
created.
When the L3 agent is running in dvr-snat mode on the controller and
dvr on the compute node, if I do steps 4-6 on the compute node, the VM
will no longer be ssh'ed either. The qrouter namespace doesn't have
all the needed interfaces either.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1519926/+subscriptions
References