yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #85285
[Bug 1735724] Re: Metadata iptables rules never inserted upon exception on router creation
I was trying to reproduce that issue today and I couldn't.
Looking at the code it seems for me that after Brian's change [1] those rules are now added to the iptables_manager during creation of the router_info instance. So it's way before ri.process() is really called. If there will be any issue in that constuctor, there will be even no namespace created at all for the router.
[1] https://review.openstack.org/524406
** Changed in: neutron
Status: New => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1735724
Title:
Metadata iptables rules never inserted upon exception on router
creation
Status in neutron:
Fix Released
Status in OpenStack Security Advisory:
Incomplete
Bug description:
We've been debugging some issues being seen lately [0] and found out
that there's a bug in l3 agent when creating routers (or during
initial sync). Jakub Libosvar and I spent some time recreating the
issue and this is what we got:
Especially since we bumped to ovsdbapp 0.8.0, we've seen some jobs
failing due to errors when authenticating using PK to a VM. The TCP
connection to the SSH port was successfully established but the
authentication failed. After debugging further, we found out that
metadata rules in qrouter namespace which redirect traffic to haproxy
(which replaced old neutron-ns-metadata-proxy) were missing, so VM's
weren't fetching metadata (hence, public key).
These rules are installed by metadata driver after a router is created [1] on the AFTER_CREATE notification. Also, they will get created during the initial sync of the l3 agent (since it's still unknown for the agent) [2]. Here, if we don't know the router yet, we'll call _proccess_added_router() and if it's a known router we'll call _process_updated_router().
After our tests, we've seen that iptables rules are never restored if we simulate an
Exception inside ri.process() at [3] even though the router is scheduled for resync [4]. The reason why this happens is because we've already added it to our router info [5] so even though
ri.process() fails at L481 and it's scheduled for resync, next time _process_updated_router()
will get called instead of _process_added_router() thus not pushing the notification into
metadata driver to install iptables rules and they never get installed.
In conclusion, if an error occurs during _process_added_router() we might end up losing
metadata forever until we restart the agent and this call succeeds. Worse, we will be
forwarding metadata requests via br-ex which could lead to security issues (ie. we could be injecting wrong metadata from the outside or the metadata server running in the underlying cloud may respond).
With ovsdbapp 0.9.0 we're minimizing this because if a port fails to be added to br-int, ovsdbapp will enqueue the transaction instead of throwing an Exception but there could be still some other exceptions I guess that reproduces this scenario outside of ovsdbapp so we need to fix it
in Neutron.
Thanks
Daniel Alvarez
---
[0] https://bugs.launchpad.net/tripleo/+bug/1731063
[1] https://github.com/openstack/neutron/blob/02fa049c5f5a38a276bec6e55c68ac19cd08c59f/neutron/agent/metadata/driver.py#L288
[2] https://github.com/openstack/neutron/blob/02fa049c5f5a38a276bec6e55c68ac19cd08c59f/neutron/agent/l3/agent.py#L472
[3] https://github.com/openstack/neutron/blob/02fa049c5f5a38a276bec6e55c68ac19cd08c59f/neutron/agent/l3/agent.py#L481
[4] https://github.com/openstack/neutron/blob/02fa049c5f5a38a276bec6e55c68ac19cd08c59f/neutron/agent/l3/agent.py#L565
[5] https://github.com/openstack/neutron/blob/02fa049c5f5a38a276bec6e55c68ac19cd08c59f/neutron/agent/l3/agent.py#L478
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1735724/+subscriptions
References