yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #85535
[Bug 1887405] Re: Race condition while processing security_groups_member_updated events (ipset)
Fix released and backported to stable/victoria and stable/ussuri -
https://review.opendev.org/q/I6e4abe13a7541c21399466c5eb0d61ff5780c887
** Changed in: neutron
Status: In Progress => Fix Released
** Also affects: ubuntu
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1887405
Title:
Race condition while processing security_groups_member_updated events
(ipset)
Status in Ubuntu Cloud Archive:
New
Status in Ubuntu Cloud Archive ussuri series:
New
Status in Ubuntu Cloud Archive victoria series:
New
Status in neutron:
Fix Released
Status in neutron package in Ubuntu:
New
Status in neutron source package in Focal:
New
Status in neutron source package in Groovy:
New
Status in neutron source package in Hirsute:
New
Bug description:
# Summary
Race condition while processing security_groups_member_updated events
(ipset)
# Overview
We have a customer that uses heat templates to deploy large
environments (e.g. 21 instances) with a significant number of security
groups (e.g. 60) that use bi-directional remote group references for
both ingress and egress filtering. These heat stacks are deployed
using a CI pipeline and intermittently suffer from application layer
failures due to broken network connectivity. We found that this was
caused by the ipsets used to implement remote_group memberships
missing IPs from their member lists. Troubleshooting suggests this is
caused by a race condition, which I've attempted to describe in detail
below.
Version: `54e1a6b1bc378c0745afc03987d0fea241b826ae` (HEAD of
stable/rocky as of Jan 26, 2020), though I suspect this issue persists
through master.
I'm working on getting some multi-node environments deployed (I don't
think it's possible to reproduce this with a single hypervisor) and
hope to provide reproduction steps on Rocky and master soon. I wanted
to get this report submitted as-is with the hopes that an experienced
Neutron dev might be able to spot possible solutions or provide
diagnostic insight that I am not yet able to produce.
I suspect this report may be easier to read with some markdown, so
please feel free to read it in a gist:
https://gist.github.com/cfarquhar/20fddf2000a83216021bd15b512f772b
Also, this diagram is probably critical to following along: https
://user-images.githubusercontent.com/1253665/87317744-0a75b180-c4ed-
11ea-9bad-085019c0f954.png
# Race condition symptoms
Given the following security groups/rules:
```
| secgroup name | secgroup id | direction | remote group | dest port |
|---------------|--------------------------------------|-----------|--------------------------------------|-----------|
| server | fcd6cf12-2ac9-4704-9208-7c6cb83d1a71 | ingress | b52c8c54-b97a-477d-8b68-f4075e7595d9 | 9092 |
| client | b52c8c54-b97a-477d-8b68-f4075e7595d9 | egress | fcd6cf12-2ac9-4704-9208-7c6cb83d1a71 | 9092 |
```
And the following instances:
```
| instance name | hypervisor | ip | secgroup assignment |
|---------------|------------|-------------|---------------------|
| server01 | compute01 | 192.168.0.1 | server |
| server02 | compute02 | 192.168.0.2 | server |
| server03 | compute03 | 192.168.0.3 | server |
| client01 | compute04 | 192.168.0.4 | client |
```
We would expect to find the following ipset representing the `server`
security group members on `compute04`:
```
# ipset list NIPv4fcd6cf12-2ac9-4704-9208-
Name: NIPv4fcd6cf12-2ac9-4704-9208-
Type: hash:net
Revision: 6
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 536
References: 4
Number of entries: 3
Members:
192.168.0.1
192.168.0.2
192.168.0.3
```
What we actually get when the race condition is triggered is an
incomplete list of members in the ipset. The member list could
contain anywhere between zero and two of the expected IPs.
# Triggering the race condition
The problem occurs when `security_group_member_updated` events arrive between `port_update` steps 12 and 22 (see diagram and process details below).
- `port_update` step 12 retrieves the remote security groups' member lists, which are not necessarily complete yet.
- `port_update` step 22 adds the port to `IptablesFirewallDriver.ports()`.
This results in `security_group_member_updated` step 3 looking for the
port to apply the updated member list to (in
`IptablesFirewallDriver.ports()`) BEFORE it has been added by
`port_update`'s step 22. This causes the membership update event to
effectively be discarded. We are then left with whatever the remote
security group's member list was when the `port_update` process
retrieved it at step 12. This state persists until something triggers
the port being re-added to the `updated_ports` list (e.g. agent
restart, another remote group membership change, local security group
addition/removal, etc).
# Race condition details
The race condition occurs in the linuxbridge agent between the two following operations:
1) Processing a `port_update` event when an instance is first created
2) Processing `security_group_member_updated` events for the instance's remote security groups.
Either of these operations can result in creating or mutating an ipset
from `IpsetManager.set_members()`. The relevant control flow sequence
for each operation is listed below. I've left out any branches that
did not seem to be relevant to the race condition.
## Processing a `port_update` event:
1) We receive an RPC port_update event via `LinuxBridgeRpcCallbacks.port_update()`, which adds the tap device to the `LinuxBridgeRpcCallbacks.updated_devices` list
2) Sleep until the next `CommonAgentLoop.daemon_loop()` iteration
3) `CommonAgentLoop.daemon_loop()` calls `CommonAgentLoop.scan_devices()`
4) `CommonAgentLoop.scan_devices()` calls `LinuxBridgeRpcCallbacks.get_and_clear_updated_devices()` to retrieve and clear the `updated_devices` list from step 1.
5) `CommonAgentLoop.scan_devices()` performs some calculations and returns control to `CommonAgentLoop.daemon_loop()` along with a list of added or updated devices.
6) `CommonAgentLoop.daemon_loop()` calls `CommonAgentLoop.process_network_devices()` when there are added or updated devices from step 5.
7) `CommonAgentLoop.process_network_devices()` calls `SecurityGroupAgentRpc.setup_port_filters()` with the added or updated devices from step 5.
8) `SecurityGroupAgentRpc.setup_port_filters()` checks for devices that were added to the `SecurityGroupAgentRpc.devices_to_refilter` list (which happens in step 4 of the `security_group_member_updated` process) and appends them to the list of updated devices. This is where devices from the `security_group_member_updated` process would have been processed if they weren't lost due to the race condition.
9) `SecurityGroupAgentRpc.setup_port_filters()` calls `SecurityGroupAgentRpc.prepare_devices_filter()` with the updated port IDs
10) `SecurityGroupAgentRpc.prepare_devices_filter()` calls `SecurityGroupAgentRpc._apply_port_filter()` with the updated port IDs
11) `SecurityGroupAgentRpc._apply_port_filter()` calls `SecurityGroupServerRpcApi.security_group_info_for_devices()` with the tap device id
12) `SecurityGroupServerRpcApi.security_group_info_for_devices()` retrieves detailed information about the port via RPC. The important detail here is a `sg_member_ids` dict which contains a list of member IPs for each remote security groups applicable to the port.
13) `SecurityGroupServerRpcApi.security_group_info_for_devices()` returns control to `SecurityGroupAgentRpc._apply_port_filter()` along with the port and security group details.
14) `SecurityGroupAgentRpc._apply_port_filter()` calls `SecurityGroupAgentRpc._update_security_group_info()` with the remote security groups and their member IP list.
15) `SecurityGroupAgentRpc._update_security_group_info()` iterates over the remote security groups and calls `IptablesFirewallDriver.update_security_group_members()` for each one
16) `IptablesFirewallDriver.update_security_group_members()` calls `IptablesFirewallDriver._update_ipset_members()`
17) `IptablesFirewallDriver._update_ipset_members()` calls `IpsetManager.set_members()`
18) `IpsetManager.set_members()` calls a number of methods to create or mutate the ipset using linux's `ipset` commands.
19) The stack unwinds back up to `SecurityGroupAgentRpc._apply_port_filter()` (last seen in step 11)
20) `SecurityGroupAgentRpc._apply_port_filter()` calls `IptablesFirewallDriver.prepare_port_filter()` with the port
21) `IptablesFirewallDriver.prepare_port_filter()` calls `IptablesFirewallDriver._set_ports()` with the port details retrieved in step 12
22) `IptablesFirewallDriver._set_ports()` adds the port to `IptablesFirewallDriver.filtered_ports`, which indicates the port is known to exist on the hypervisor. It will now be returned by IptablesFirewallDriver.ports().
## Processing a `security_group_member_updated` event:
1) We receive an RPC security_group_member_updated event via `SecurityGroupAgentRpc.security_groups_member_updated()`
2) `SecurityGroupAgentRpc.security_groups_member_updated()` calls `SecurityGroupAgentRpc._security_group_updated()` with the list of security groups whose membership was updated.
3) `SecurityGroupAgentRpc._security_group_updated()` consults `IptablesFirewallDriver.ports()` to find ports affected by the membership updates. In the race condition, `port_update` has not yet reached step 22 to add the port we need to find, so we stop here and the following step does not occur.
4) `SecurityGroupAgentRpc._security_group_updated()` adds each affected port from the previous step to the `SecurityGroupAgentRpc.devices_to_refilter` list. This list will be processed next time we reach `port_event` step 8, but in the race condition we never get here.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1887405/+subscriptions
References