yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #17495
[Bug 1346861] [NEW] l3 cannot re-create device in deleted namespace
Public bug reported:
If an ovs-managed device (device created by add-port followed by set
type=internal)'s namespace is being used by some process and then
deleted, L3 agent will fail to re-create the device.
Steps to repro:
- Stop l3-agent.
- Choose a router namespace with at least one ovs-managed device in it. For example, "qrouter-df5a3693-ec4d-4023-9e73-8dce9c4ac184" has a device "qg-df5a3693-ec"
- Ensure the namespace is used by at least one process. For demo purpose, start another shell using "ip netns exec qrouter-df5a3693-ec4d-4023-9e73-8dce9c4ac184 bash". In reality, ns-metadata-proxy or keepalived may live in the namespace
- Delete the namespace by "ip netns del qrouter-df5a3693-ec4d-4023-9e73-8dce9c4ac184". The command won't fail and the devices in the deleted namespace are still alive, observable by "ip link" in previously opened shell. However, there is no easy method to enter the namespace from outside again.
- Start l3 agent.
- Verify "qg-df5a3693-ec" cannot be recreated and managed by L3. The backtrace looks like (this is our branch, may differ with upstream):
ERROR neutron.agent.l3_agent Failed synchronizing routers
TRACE neutron.agent.l3_agent Traceback (most recent call last):
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 1429, in _sync_routers_task
TRACE neutron.agent.l3_agent self._process_routers(routers, all_routers=True)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 1354, in _process_routers
TRACE neutron.agent.l3_agent self._router_added(r['id'], r)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 672, in _router_added
TRACE neutron.agent.l3_agent self.process_ha_router_added(ri)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 923, in process_ha_router_added
TRACE neutron.agent.l3_agent vip_cidrs=[gw_ip_cidr])
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 897, in ha_network_added
TRACE neutron.agent.l3_agent prefix=HA_DEV_PREFIX)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/linux/interface.py", line 194, in plug
TRACE neutron.agent.l3_agent ns_dev.link.set_address(mac_address)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 230, in set_address
TRACE neutron.agent.l3_agent self._as_root('set', self.name, 'address', mac_address)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 217, in _as_root
TRACE neutron.agent.l3_agent kwargs.get('use_root_namespace', False))
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 70, in _as_root
TRACE neutron.agent.l3_agent namespace)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 81, in _execute
TRACE neutron.agent.l3_agent root_helper=root_helper)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/linux/utils.py", line 90, in execute
TRACE neutron.agent.l3_agent raise RuntimeError(m)
TRACE neutron.agent.l3_agent RuntimeError:
TRACE neutron.agent.l3_agent Command: ['sudo', '/usr/local/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'link', 'set', 'ha-5bd08318-aa', 'address', 'fa:16:3e:f3:2b:6b']
TRACE neutron.agent.l3_agent Exit code: 1
TRACE neutron.agent.l3_agent Stdout: ''
TRACE neutron.agent.l3_agent Stderr: 'Cannot find device "ha-5bd08318-aa"\n'
TRACE neutron.agent.l3_agent
The root cause is that ovs-vsctl "can perform any number of commands in
a single run, implemented as a single atomic transaction against the
database." and neutron currently use the following to create ovs-managed
device:
ovs-vsctl -- --if-exists del-port qr-2f4c613d-b7 -- add-port br-int
qr-2f4c613d-b7 -- set Interface qr-2f4c613d-b7 type=internal -- set
Interface qr-2f4c613d-b7 external-ids:iface-id=2f4c613d-
b7f2-4d63-89c8-af2d48948d19 -- set Interface qr-2f4c613d-b7 external-ids
:iface-status=active -- set Interface qr-2f4c613d-b7 external-ids
:attached-mac=fa:16:3e:3c:4d:18
ovs can delete devices it manages even the device is in a deleted (lost)
namespace. But if del-port, add-port and set type=internal are put
together in one ovs-vsctl command, ovs will do nothing to the device and
the device is left as is.
In OVSInterfaceDriver.plug(self, network_id, port_id, device_name, mac_address,bridge=None, namespace=None, prefix=None):
self._ovs_add_port(bridge, tap_name, port_id, mac_address,
internal=internal)
ns_dev.link.set_address(mac_address)
if self.conf.network_device_mtu:
ns_dev.link.set_mtu(self.conf.network_device_mtu)
if self.conf.ovs_use_veth:
root_dev.link.set_mtu(self.conf.network_device_mtu)
# Add an interface created by ovs to the namespace.
if not self.conf.ovs_use_veth and namespace:
namespace_obj = ip.ensure_namespace(namespace)
namespace_obj.add_device_to_namespace(ns_dev)
You can see that set mac address, set mtu, set namespace stuff all uses
`ip` command directly, which requires `ip` to have access the the
device. The device created or re-created by ovs (in self._ovs_add_port)
must not belong to any namespace. This can be guarnteed by splitting the
giant ovs-vsctl command above into two parts:
ovs-vsctl --if-exists del-port qr-2f4c613d-b7
ovs-vsctl -- add-port br-int qr-2f4c613d-b7 -- set Interface qr-2f4c613d-b7 type=internal -- set Interface qr-2f4c613d-b7 external-ids:iface-id=2f4c613d-b7f2-4d63-89c8-af2d48948d19 -- set Interface qr-2f4c613d-b7 external-ids:iface-status=active -- set Interface qr-2f4c613d-b7 external-ids:attached-mac=fa:16:3e:3c:4d:18
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1346861
Title:
l3 cannot re-create device in deleted namespace
Status in OpenStack Neutron (virtual network service):
New
Bug description:
If an ovs-managed device (device created by add-port followed by set
type=internal)'s namespace is being used by some process and then
deleted, L3 agent will fail to re-create the device.
Steps to repro:
- Stop l3-agent.
- Choose a router namespace with at least one ovs-managed device in it. For example, "qrouter-df5a3693-ec4d-4023-9e73-8dce9c4ac184" has a device "qg-df5a3693-ec"
- Ensure the namespace is used by at least one process. For demo purpose, start another shell using "ip netns exec qrouter-df5a3693-ec4d-4023-9e73-8dce9c4ac184 bash". In reality, ns-metadata-proxy or keepalived may live in the namespace
- Delete the namespace by "ip netns del qrouter-df5a3693-ec4d-4023-9e73-8dce9c4ac184". The command won't fail and the devices in the deleted namespace are still alive, observable by "ip link" in previously opened shell. However, there is no easy method to enter the namespace from outside again.
- Start l3 agent.
- Verify "qg-df5a3693-ec" cannot be recreated and managed by L3. The backtrace looks like (this is our branch, may differ with upstream):
ERROR neutron.agent.l3_agent Failed synchronizing routers
TRACE neutron.agent.l3_agent Traceback (most recent call last):
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 1429, in _sync_routers_task
TRACE neutron.agent.l3_agent self._process_routers(routers, all_routers=True)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 1354, in _process_routers
TRACE neutron.agent.l3_agent self._router_added(r['id'], r)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 672, in _router_added
TRACE neutron.agent.l3_agent self.process_ha_router_added(ri)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 923, in process_ha_router_added
TRACE neutron.agent.l3_agent vip_cidrs=[gw_ip_cidr])
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 897, in ha_network_added
TRACE neutron.agent.l3_agent prefix=HA_DEV_PREFIX)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/linux/interface.py", line 194, in plug
TRACE neutron.agent.l3_agent ns_dev.link.set_address(mac_address)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 230, in set_address
TRACE neutron.agent.l3_agent self._as_root('set', self.name, 'address', mac_address)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 217, in _as_root
TRACE neutron.agent.l3_agent kwargs.get('use_root_namespace', False))
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 70, in _as_root
TRACE neutron.agent.l3_agent namespace)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 81, in _execute
TRACE neutron.agent.l3_agent root_helper=root_helper)
TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/linux/utils.py", line 90, in execute
TRACE neutron.agent.l3_agent raise RuntimeError(m)
TRACE neutron.agent.l3_agent RuntimeError:
TRACE neutron.agent.l3_agent Command: ['sudo', '/usr/local/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'link', 'set', 'ha-5bd08318-aa', 'address', 'fa:16:3e:f3:2b:6b']
TRACE neutron.agent.l3_agent Exit code: 1
TRACE neutron.agent.l3_agent Stdout: ''
TRACE neutron.agent.l3_agent Stderr: 'Cannot find device "ha-5bd08318-aa"\n'
TRACE neutron.agent.l3_agent
The root cause is that ovs-vsctl "can perform any number of commands
in a single run, implemented as a single atomic transaction against
the database." and neutron currently use the following to create ovs-
managed device:
ovs-vsctl -- --if-exists del-port qr-2f4c613d-b7 -- add-port br-int
qr-2f4c613d-b7 -- set Interface qr-2f4c613d-b7 type=internal -- set
Interface qr-2f4c613d-b7 external-ids:iface-id=2f4c613d-
b7f2-4d63-89c8-af2d48948d19 -- set Interface qr-2f4c613d-b7 external-
ids:iface-status=active -- set Interface qr-2f4c613d-b7 external-ids
:attached-mac=fa:16:3e:3c:4d:18
ovs can delete devices it manages even the device is in a deleted
(lost) namespace. But if del-port, add-port and set type=internal are
put together in one ovs-vsctl command, ovs will do nothing to the
device and the device is left as is.
In OVSInterfaceDriver.plug(self, network_id, port_id, device_name, mac_address,bridge=None, namespace=None, prefix=None):
self._ovs_add_port(bridge, tap_name, port_id, mac_address,
internal=internal)
ns_dev.link.set_address(mac_address)
if self.conf.network_device_mtu:
ns_dev.link.set_mtu(self.conf.network_device_mtu)
if self.conf.ovs_use_veth:
root_dev.link.set_mtu(self.conf.network_device_mtu)
# Add an interface created by ovs to the namespace.
if not self.conf.ovs_use_veth and namespace:
namespace_obj = ip.ensure_namespace(namespace)
namespace_obj.add_device_to_namespace(ns_dev)
You can see that set mac address, set mtu, set namespace stuff all
uses `ip` command directly, which requires `ip` to have access the the
device. The device created or re-created by ovs (in
self._ovs_add_port) must not belong to any namespace. This can be
guarnteed by splitting the giant ovs-vsctl command above into two
parts:
ovs-vsctl --if-exists del-port qr-2f4c613d-b7
ovs-vsctl -- add-port br-int qr-2f4c613d-b7 -- set Interface qr-2f4c613d-b7 type=internal -- set Interface qr-2f4c613d-b7 external-ids:iface-id=2f4c613d-b7f2-4d63-89c8-af2d48948d19 -- set Interface qr-2f4c613d-b7 external-ids:iface-status=active -- set Interface qr-2f4c613d-b7 external-ids:attached-mac=fa:16:3e:3c:4d:18
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1346861/+subscriptions
Follow ups
References