yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1218556] Re: veth pair connecting between physical and integration bridge down after ovs agent restart

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1218556@xxxxxxxxxxxxxxxxxx>
Date: Tue, 08 Oct 2013 20:08:28 -0000
Reply-to: Bug 1218556 <1218556@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Reviewed:  https://review.openstack.org/50388
Committed: http://github.com/openstack/neutron/commit/99440a63af5a2c4c2e139036c42db5c64e9495b2
Submitter: Jenkins
Branch:    milestone-proposed

commit 99440a63af5a2c4c2e139036c42db5c64e9495b2
Author: Ralf Haferkamp <rhafer@xxxxxxx>
Date:   Thu Aug 29 20:50:55 2013 +0200

    Avoid race with udev during ovs agent startup
    
    After taking down the veth link between the physical bridge and the integration
    bridge call udevadm settle to wait for any udev events to be completely
    processed by the operating system before recreating the veth pair.
    
    Some distributions (e.g. openSUSE) have udev rules installed by default that
    call e.g. ifdown <interface> during the remove event. If that is processed
    after the ovs agent already brought up the veth pair again the veth pair's
    link will be down after the agent completed startup and networking will be
    broken for all VM instances.
    
    Change-Id: I95520ea96a9804c5261a0c994bbca137535cc37c
    Closes-Bug: #1218556
    (cherry picked from commit 8d88ee7411d43f148b45d0a145fe32a75765a3ac)


** Changed in: neutron
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1218556

Title:
  veth pair connecting between physical and integration bridge down
  after ovs agent restart

Status in OpenStack Neutron (virtual network service):
  Fix Released

Bug description:
  Sometimes after restarting the  openvswitch-agent the veth pair that
  connects the physical bridge with the integration bridge doesn't come
  up correctly. (Which of cause disconnects any running VM instance from
  the network)

  # /etc/init.d/openstack-neutron-openvswitch-agent restart
  # ip addr show
  [..]
  83: phy-br-eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
      link/ether 3a:6c:d6:a4:1c:89 brd ff:ff:ff:ff:ff:ff
  84: int-br-eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
      link/ether a2:12:2a:e5:b8:e4 brd ff:ff:ff:ff:ff:ff
  [..]

  I was able to reproduce this problem on openSUSE 12.3 and SLES 11.
  Ubuntu seems to be unaffected by this.

  Doing a manual "ip link set up dev <device>" on both ends of the veth
  pair fixes the problem. (until another restarted might bring it back)

  I think I was able to track this down to a race condition between udev
  (and its network rules) and the ip commands that the openvswitch-agent
  during startup. Among other things the agent does this during startup:

  ip link delete  int-br-fixed
  ip link add int-br-fixed type veth peer  name phy-br-fixed
  ip link set int-br-fixed up
  ip link set phy-br-fixed up

  The ip link delete and ip link add command cause several udev events
  to be fired. However on my system the processing of the udev rules
  takes so long that the "remove" events are not completely processed
  before the ip link add command is started. Which causes the interface
  to be down after the above commands completed.

  A possible fix for this is to call "udevadm settle" after the ip link
  delete call.

  I will upload a draft patch for review shortly.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1218556/+subscriptions