group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #00980
[Bug 1460164] Re: restart of openvswitch-switch causes instance network down when l2population enabled
This bug was fixed in the package neutron - 1:2014.1.5-0ubuntu3
---------------
neutron (1:2014.1.5-0ubuntu3) trusty; urgency=medium
[ Corey Bryant ]
* d/p/make_del_fdb_flow_idempotent.patch: Cherry pick from Juno
to prevent KeyError on duplicate port removal in del_fdb_flow()
(LP: #1531963).
* d/tests/*-plugin: Fix race between service restart and pidof test.
[ James Page ]
* d/p/ovs-restart.patch: Ensure that tunnels are fully reset on ovs
restart (LP: #1460164).
-- Corey Bryant <corey.bryant@xxxxxxxxxxxxx> Wed, 10 Feb 2016 14:52:04
-0500
** Changed in: neutron (Ubuntu Trusty)
Status: Fix Committed => Fix Released
** Changed in: neutron (Ubuntu Wily)
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1460164
Title:
restart of openvswitch-switch causes instance network down when
l2population enabled
Status in Ubuntu Cloud Archive:
In Progress
Status in Ubuntu Cloud Archive icehouse series:
Fix Committed
Status in Ubuntu Cloud Archive juno series:
New
Status in Ubuntu Cloud Archive kilo series:
In Progress
Status in neutron:
Fix Released
Status in neutron package in Ubuntu:
Fix Released
Status in neutron source package in Trusty:
Fix Released
Status in neutron source package in Wily:
Fix Released
Status in neutron source package in Xenial:
Fix Released
Bug description:
[Impact]
Restarts of openvswitch (typically on upgrade) result in loss of tunnel connectivity when the l2population driver is in use. This results in loss of access to all instances on the effected compute hosts
[Test Case]
Deploy cloud with ml2/ovs/l2population enabled
boot instances
restart ovs; instance connectivity will be lost until the neutron-openvswitch-agent is restarted on the compute hosts.
[Regression Potential]
Minimal - in multiple stable branches upstream.
[Original Bug Report]
On 2015-05-28, our Landscape auto-upgraded packages on two of our
OpenStack clouds. On both clouds, but only on some compute nodes, the
upgrade of openvswitch-switch and corresponding downtime of
ovs-vswitchd appears to have triggered some sort of race condition
within neutron-plugin-openvswitch-agent leaving it in a broken state;
any new instances come up with non-functional network but pre-existing
instances appear unaffected. Restarting n-p-ovs-agent on the affected
compute nodes is sufficient to work around the problem.
The packages Landscape upgraded (from /var/log/apt/history.log):
Start-Date: 2015-05-28 14:23:07
Upgrade: nova-compute-libvirt:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), libsystemd-login0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), nova-compute-kvm:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), systemd-services:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-common:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), nova-common:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), python-nova:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), libsystemd-daemon0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), grub-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), libpam-systemd:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), udev:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), grub2-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), openvswitch-switch:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2), libudev1:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-client:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), python-eventlet:amd64 (0.13.0-1ubuntu2, 0.13.0-1ubuntu2.1), python-novaclient:amd64 (2.17.0-0ubuntu1.1, 2.17.0-0ubuntu1.2), grub-pc-bin:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), grub-pc:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), nova-compute:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), openvswitch-common:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2)
End-Date: 2015-05-28 14:24:47
From /var/log/neutron/openvswitch-agent.log:
2015-05-28 14:24:18.336 47866 ERROR neutron.agent.linux.ovsdb_monitor
[-] Error received from ovsdb monitor: ovsdb-client:
unix:/var/run/openvswitch/db.sock: receive failed (End of file)
Looking at a stuck instances, all the right tunnels and bridges and
what not appear to be there:
root@vector:~# ip l l | grep c-3b
460002: qbr7ed8b59c-3b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
460003: qvo7ed8b59c-3b: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
460004: qvb7ed8b59c-3b: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr7ed8b59c-3b state UP mode DEFAULT group default qlen 1000
460005: tap7ed8b59c-3b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr7ed8b59c-3b state UNKNOWN mode DEFAULT group default qlen 500
root@vector:~# ovs-vsctl list-ports br-int | grep c-3b
qvo7ed8b59c-3b
root@vector:~#
But I can't ping the unit from within the qrouter-${id} namespace on
the neutron gateway. If I tcpdump the {q,t}*c-3b interfaces, I don't
see any traffic.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1460164/+subscriptions