yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #05667
[Bug 1218556] Re: veth pair connecting between physical and integration bridge down after ovs agent restart
Reviewed: https://review.openstack.org/50388
Committed: http://github.com/openstack/neutron/commit/99440a63af5a2c4c2e139036c42db5c64e9495b2
Submitter: Jenkins
Branch: milestone-proposed
commit 99440a63af5a2c4c2e139036c42db5c64e9495b2
Author: Ralf Haferkamp <rhafer@xxxxxxx>
Date: Thu Aug 29 20:50:55 2013 +0200
Avoid race with udev during ovs agent startup
After taking down the veth link between the physical bridge and the integration
bridge call udevadm settle to wait for any udev events to be completely
processed by the operating system before recreating the veth pair.
Some distributions (e.g. openSUSE) have udev rules installed by default that
call e.g. ifdown <interface> during the remove event. If that is processed
after the ovs agent already brought up the veth pair again the veth pair's
link will be down after the agent completed startup and networking will be
broken for all VM instances.
Change-Id: I95520ea96a9804c5261a0c994bbca137535cc37c
Closes-Bug: #1218556
(cherry picked from commit 8d88ee7411d43f148b45d0a145fe32a75765a3ac)
** Changed in: neutron
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1218556
Title:
veth pair connecting between physical and integration bridge down
after ovs agent restart
Status in OpenStack Neutron (virtual network service):
Fix Released
Bug description:
Sometimes after restarting the openvswitch-agent the veth pair that
connects the physical bridge with the integration bridge doesn't come
up correctly. (Which of cause disconnects any running VM instance from
the network)
# /etc/init.d/openstack-neutron-openvswitch-agent restart
# ip addr show
[..]
83: phy-br-eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
link/ether 3a:6c:d6:a4:1c:89 brd ff:ff:ff:ff:ff:ff
84: int-br-eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
link/ether a2:12:2a:e5:b8:e4 brd ff:ff:ff:ff:ff:ff
[..]
I was able to reproduce this problem on openSUSE 12.3 and SLES 11.
Ubuntu seems to be unaffected by this.
Doing a manual "ip link set up dev <device>" on both ends of the veth
pair fixes the problem. (until another restarted might bring it back)
I think I was able to track this down to a race condition between udev
(and its network rules) and the ip commands that the openvswitch-agent
during startup. Among other things the agent does this during startup:
ip link delete int-br-fixed
ip link add int-br-fixed type veth peer name phy-br-fixed
ip link set int-br-fixed up
ip link set phy-br-fixed up
The ip link delete and ip link add command cause several udev events
to be fired. However on my system the processing of the udev rules
takes so long that the "remove" events are not completely processed
before the ip link add command is started. Which causes the interface
to be down after the above commands completed.
A possible fix for this is to call "udevadm settle" after the ip link
delete call.
I will upload a draft patch for review shortly.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1218556/+subscriptions