← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1471916] [NEW] race with neutron: can't create bridge with the same name

 

Public bug reported:

If a node running nova-compute is using the neutron with the linuxbridge
agent, and the node is also running other neutron agents like the
l3-agent or dhcp-agent (ie the node is acting as compute node and a
network node) then races can happen when both nova-compute and the
linuxbridge agent try to add the same linux bridge at the same time.
This can cause instances to fail to start.

http://logs.openstack.org/12/138512/10/experimental/check-tempest-dsvm-
neutron-linuxbridge/e737f60/

http://logs.openstack.org/12/138512/10/experimental/check-tempest-dsvm-
neutron-
linuxbridge/e737f60/logs/screen-n-cpu.txt.gz#_2015-07-02_03_09_09_131

The neutron l3-agent added a tap tap device:

2015-07-02 03:09:06.614 DEBUG neutron.agent.linux.utils [req-42397477-9e58-4c7a-aa0b-8e54269f9d34 None None] Running command (rootwrap daemon): ['ip', 'link', 'add', 'tapbb2831fd-86', 'type', 'veth', 'peer', 'name', 'qr-bb2831fd-86', 'netns', 'qrouter-048a7c31-08a1-4693-91a8-13e3221c0982'] execute_rootwrap_daemon /opt/stack/new/neutron/neutron/agent/linux/utils.py:101
Command: ['ip', 'link', 'add', u'tapbb2831fd-86', 'type', 'veth', 'peer', 'name', u'qr-bb2831fd-86', 'netns', u'qrouter-048a7c31-08a1-4693-91a8-13e3221c0982']


The linuxbridge-agent detects this device and adds the bridge for it's network:

2015-07-02 03:09:09.119 DEBUG neutron.agent.linux.utils [req-d4f88282-233b-437e-8157-5d134538d206 None None] Running command (rootwrap daemon): ['brctl', 'addbr', 'brq94158651-73'] execute_rootwrap_daemon /opt/stack/new/neutron/neutron/agent/linux/utils.py:101
2015-07-02 03:09:09.122 DEBUG neutron.agent.linux.utils [req-d4f88282-233b-437e-8157-5d134538d206 None None] 
Command: ['brctl', 'addbr', u'brq94158651-73']
Exit code: 0


About the same time, nova-compute also tried to add the same bridge for a VIF it was about to create on the same network, but lost the race and so the instance went into an error state:

2015-07-02 03:09:09.130 DEBUG oslo_concurrency.processutils [req-8391bcc1-bc92-48c7-a635-7e3b2f1af76b tempest-ServerRescueNegativeTestJSON-1230044074 
tempest-ServerRescueNegativeTestJSON-1737504944] CMD "sudo nova-rootwrap /etc/nova/rootwrap.conf brctl addbr brq94158651-73" returned: 1 in 0.094s exe
cute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:247
2015-07-02 03:09:09.131 DEBUG oslo_concurrency.processutils [req-8391bcc1-bc92-48c7-a635-7e3b2f1af76b tempest-ServerRescueNegativeTestJSON-1230044074 
tempest-ServerRescueNegativeTestJSON-1737504944] u'sudo nova-rootwrap /etc/nova/rootwrap.conf brctl addbr brq94158651-73' failed. Not Retrying. execut
e /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:295
2015-07-02 03:09:09.131 DEBUG oslo_concurrency.lockutils [req-8391bcc1-bc92-48c7-a635-7e3b2f1af76b tempest-ServerRescueNegativeTestJSON-1230044074 tem
pest-ServerRescueNegativeTestJSON-1737504944] Lock "lock_bridge" released by "ensure_bridge" :: held 0.096s inner /usr/local/lib/python2.7/dist-packag
es/oslo_concurrency/lockutils.py:262
2015-07-02 03:09:09.131 ERROR nova.compute.manager [req-8391bcc1-bc92-48c7-a635-7e3b2f1af76b tempest-ServerRescueNegativeTestJSON-1230044074 tempest-S
erverRescueNegativeTestJSON-1737504944] [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Instance failed to spawn
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Traceback (most recent call last):
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/compute/mana
ger.py", line 2113, in _build_resources
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     yield resources
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/compute/mana
ger.py", line 1985, in _build_and_run_instance
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     block_device_info=block_device_info)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/virt/libvirt
/driver.py", line 2435, in spawn
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     block_device_info=block_device_info)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/virt/libvirt
/driver.py", line 4454, in _create_domain_and_network
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     self.plug_vifs(instance, network_info)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/virt/libvirt
/driver.py", line 619, in plug_vifs
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     self.vif_driver.plug(instance, vif)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/virt/libvirt
/vif.py", line 609, in plug
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     func(instance, vif)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/virt/libvirt
/vif.py", line 401, in plug_bridge
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     iface)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 252, in inner
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     return f(*args, **kwargs)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/network/linux_net.py", line 1602, in ensure_bridge
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     _execute('brctl', 'addbr', bridge, run_as_root=True)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/network/linux_net.py", line 1266, in _execute
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     return utils.execute(*cmd, **kwargs)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/utils.py", line 229, in execute
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     return processutils.execute(*cmd, **kwargs)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 262, in execute
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     cmd=sanitized_cmd)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] ProcessExecutionError: Unexpected error while running command.
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Command: sudo nova-rootwrap /etc/nova/rootwrap.conf brctl addbr brq94158651-73
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Exit code: 1
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Stdout: u''
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Stderr: u"device brq94158651-73 already exists; can't create bridge with the same name\n"

** Affects: nova
     Importance: Undecided
     Assignee: Darragh O'Reilly (darragh-oreilly)
         Status: In Progress

** Changed in: nova
     Assignee: (unassigned) => Darragh O'Reilly (darragh-oreilly)

** Changed in: nova
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1471916

Title:
  race with neutron: can't create bridge with the same name

Status in OpenStack Compute (Nova):
  In Progress

Bug description:
  If a node running nova-compute is using the neutron with the
  linuxbridge agent, and the node is also running other neutron agents
  like the l3-agent or dhcp-agent (ie the node is acting as compute node
  and a network node) then races can happen when both nova-compute and
  the linuxbridge agent try to add the same linux bridge at the same
  time. This can cause instances to fail to start.

  http://logs.openstack.org/12/138512/10/experimental/check-tempest-
  dsvm-neutron-linuxbridge/e737f60/

  http://logs.openstack.org/12/138512/10/experimental/check-tempest-
  dsvm-neutron-
  linuxbridge/e737f60/logs/screen-n-cpu.txt.gz#_2015-07-02_03_09_09_131

  The neutron l3-agent added a tap tap device:

  2015-07-02 03:09:06.614 DEBUG neutron.agent.linux.utils [req-42397477-9e58-4c7a-aa0b-8e54269f9d34 None None] Running command (rootwrap daemon): ['ip', 'link', 'add', 'tapbb2831fd-86', 'type', 'veth', 'peer', 'name', 'qr-bb2831fd-86', 'netns', 'qrouter-048a7c31-08a1-4693-91a8-13e3221c0982'] execute_rootwrap_daemon /opt/stack/new/neutron/neutron/agent/linux/utils.py:101
  Command: ['ip', 'link', 'add', u'tapbb2831fd-86', 'type', 'veth', 'peer', 'name', u'qr-bb2831fd-86', 'netns', u'qrouter-048a7c31-08a1-4693-91a8-13e3221c0982']

  
  The linuxbridge-agent detects this device and adds the bridge for it's network:

  2015-07-02 03:09:09.119 DEBUG neutron.agent.linux.utils [req-d4f88282-233b-437e-8157-5d134538d206 None None] Running command (rootwrap daemon): ['brctl', 'addbr', 'brq94158651-73'] execute_rootwrap_daemon /opt/stack/new/neutron/neutron/agent/linux/utils.py:101
  2015-07-02 03:09:09.122 DEBUG neutron.agent.linux.utils [req-d4f88282-233b-437e-8157-5d134538d206 None None] 
  Command: ['brctl', 'addbr', u'brq94158651-73']
  Exit code: 0

  
  About the same time, nova-compute also tried to add the same bridge for a VIF it was about to create on the same network, but lost the race and so the instance went into an error state:

  2015-07-02 03:09:09.130 DEBUG oslo_concurrency.processutils [req-8391bcc1-bc92-48c7-a635-7e3b2f1af76b tempest-ServerRescueNegativeTestJSON-1230044074 
  tempest-ServerRescueNegativeTestJSON-1737504944] CMD "sudo nova-rootwrap /etc/nova/rootwrap.conf brctl addbr brq94158651-73" returned: 1 in 0.094s exe
  cute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:247
  2015-07-02 03:09:09.131 DEBUG oslo_concurrency.processutils [req-8391bcc1-bc92-48c7-a635-7e3b2f1af76b tempest-ServerRescueNegativeTestJSON-1230044074 
  tempest-ServerRescueNegativeTestJSON-1737504944] u'sudo nova-rootwrap /etc/nova/rootwrap.conf brctl addbr brq94158651-73' failed. Not Retrying. execut
  e /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:295
  2015-07-02 03:09:09.131 DEBUG oslo_concurrency.lockutils [req-8391bcc1-bc92-48c7-a635-7e3b2f1af76b tempest-ServerRescueNegativeTestJSON-1230044074 tem
  pest-ServerRescueNegativeTestJSON-1737504944] Lock "lock_bridge" released by "ensure_bridge" :: held 0.096s inner /usr/local/lib/python2.7/dist-packag
  es/oslo_concurrency/lockutils.py:262
  2015-07-02 03:09:09.131 ERROR nova.compute.manager [req-8391bcc1-bc92-48c7-a635-7e3b2f1af76b tempest-ServerRescueNegativeTestJSON-1230044074 tempest-S
  erverRescueNegativeTestJSON-1737504944] [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Instance failed to spawn
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Traceback (most recent call last):
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/compute/mana
  ger.py", line 2113, in _build_resources
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     yield resources
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/compute/mana
  ger.py", line 1985, in _build_and_run_instance
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     block_device_info=block_device_info)
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/virt/libvirt
  /driver.py", line 2435, in spawn
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     block_device_info=block_device_info)
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/virt/libvirt
  /driver.py", line 4454, in _create_domain_and_network
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     self.plug_vifs(instance, network_info)
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/virt/libvirt
  /driver.py", line 619, in plug_vifs
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     self.vif_driver.plug(instance, vif)
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/virt/libvirt
  /vif.py", line 609, in plug
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     func(instance, vif)
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/virt/libvirt
  /vif.py", line 401, in plug_bridge
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     iface)
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 252, in inner
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     return f(*args, **kwargs)
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/network/linux_net.py", line 1602, in ensure_bridge
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     _execute('brctl', 'addbr', bridge, run_as_root=True)
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/network/linux_net.py", line 1266, in _execute
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     return utils.execute(*cmd, **kwargs)
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/opt/stack/new/nova/nova/utils.py", line 229, in execute
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     return processutils.execute(*cmd, **kwargs)
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]   File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 262, in execute
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c]     cmd=sanitized_cmd)
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] ProcessExecutionError: Unexpected error while running command.
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Command: sudo nova-rootwrap /etc/nova/rootwrap.conf brctl addbr brq94158651-73
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Exit code: 1
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Stdout: u''
  2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Stderr: u"device brq94158651-73 already exists; can't create bridge with the same name\n"

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1471916/+subscriptions


Follow ups

References