yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1669482] Re: fwaas: firewall rules not applied on L3 agents reboot in case of neutron-fwaas outage

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Jeremy Stanley <1669482@xxxxxxxxxxxxxxxxxx>
Date: Thu, 03 Sep 2020 15:39:01 -0000
Reply-to: Bug 1669482 <1669482@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

** Changed in: ossa
Status: Incomplete => Won't Fix

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1669482

Title:
fwaas: firewall rules not applied on L3 agents reboot in case of
neutron-fwaas outage

Status in neutron:
Won't Fix
Status in OpenStack Security Advisory:
Won't Fix

Bug description:
On L3 agent reboot (fwaas v1) or L2/L3 agents reboot (fwaas v2)
the networking stack is flushed by the LINUX system (NET namespace, iptables, ...),
hence Neutron needs to resynchronize the networking configuration.
Therefore on agents restart 'sync_routers' RPC call will be sent by agents to retrieve all the networking stacks (one by tenants).
On response they will configure: net namespace, interfaces, routing table, etc...

With "fwaas" extension enabled, iptables rules need to be apply too.
Sadly, this is not always the case due to the following bug:
https://bugs.launchpad.net/neutron/+bug/1659760

To resume the previous bug, fwaas implementation has a general RPC usage issue:
=> "fanout" is always used instead of "call".
On all CRUDs methods used on FWaaS resources v1 and v2 (Firewall, FirewallPolicy, FirewallRule, Firewallgroup, ...) an AMQP fanout cast is sent to all L3(L2) agents and all agents will respond back to neutron server.

Simple example using 40 L3 agents nodes:
Scenario: user just UPDATE a firewall rule 'name'
=> 40 RPC calls will be sent to agents (with or without routers, with or without firewall associated) and neutron server will receive back 40 responses.

This lead to a "flooded" neutron server process (or RPC workers).
=> RPC timeout will appear and the following second bug will be triggered:
https://bugs.launchpad.net/neutron/+bug/1618244

Neutron-server fwaas worker is out or order :(
If a L3(L2) agent reboot during this neutron-server "fwaas" outage,
agent will get a RPC Timeout response to get_tenants_with_firewalls, get_firewalls_for_tenant call, get_projects_with_firewall_groups and get_firewall_groups_for_project.
=> all networking stacks will be setup (namespace, interfaces, ips, routing, nat, ..), but there will be no "fwaas" iptables rules applied (ACCEPT iptables policy will be set by default).
=> all network traffic is authorized

Much worse, even if neutron-server "fwaas" worker became ready, "fwaas" iptables rules are not applied, and they will never be.
There is no exception in logs, all seems fine but iptables rules are not set.
The only solution in order to recover will be to sent updates HTTP requests on fwaas resources or restarting agents.

User is not protected by firewall.

* Step-by-step reproduction steps:
1. simulate neutron fwaas outage (on neutron-server side)
- populate many messages in q-firewall-plugin queue
- or unbind q-firewall-plugin queue from neutron-server
2. reboot an L3 agent with some router and firewall associated
3. => RPC Timeout appears in L3 agents logs (get_tenants_with_firewalls, get_firewalls_for_tenant)
4. networking stacks will be recreated (interfaces, ip, iptables NAT, ...)
but without fwaas iptables rules
5. hence traffic to/from vm is allowed :(

6. neutron fwaas outage ended
- purge messages from q-firewall-plugin queue
- or restart neutron-server (if q-firewall-plugin queue has been unbind in step 1)
7. no more RPC Timeout appears in L3 agents logs
8. but fwaas iptables rules are not set

* Version: all neutron-fwaas versions impacted (v1 and v2)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1669482/+subscriptions