yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #82863
[Bug 1881995] [NEW] Centralized SNAT failover does not recover until "systemctl restart neutron-l3-agent" on transferred node
Public bug reported:
**Environment**
Queens
OVSGTW DVR Mode: dvr_snat
CMP DVR Mode: dvr
No L3 HA
Use Case: Centralized FIPs (aka Floating IPs agains unbound ports)
https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/presentation-media/Neutron-Port-Binding-and-Impact-of-unbound-ports-on-DVR-Routers-with-FloatingIP.pdf
**How to reproduce**
1. Create normally a VM
2. Create allowed-pair port against the VM port
openstack port list --server <server_name> # Get port id
openstack port create --security-group <sec_group> --fixed-ip subnet=<subnet>,ip-address=<ip_address> --network <network name> <port name>
openstack port set --allowed-address ip-address=<ip_address> <server port>
3. Assign floating ip to the port
openstack floating ip set --port <port_name> <floating_ip>
4. Inside the deployed VM create IP alias for the new ip address
ip addr add <ip_address>/24 dev ens3
5. Detect which gtw node is hosting the centralized fip
neutron l3-agent-list-hosting-router <router>
6. Perform manual failover
neutron l3-agent-router-remove <hosting-l3-agent> <router>
neutron l3-agent-router-add <new-l3-agent> <router>
(Or) Perform automatic failover
shutdown -h now (on hosting gtw)
7. Detect failover happened on new node
neutron l3-agent-list-hosting-router <router>
**Expected Result**
Connection to floating ip address recovers automatically
**Actual Result**
Connection does not recover. Reoccurrence is 100%
**How to recover**
Perform "neutron-l3-agent" restart on hosting node (after failover).
Recovers within few seconds.
systemctl restart neutron-l3-agent
**Additional information**
After failover the SNAT namespace does not include the sysctl rules that
should be added upon namespace creation. We have also confirmed that
fixing them manually also fixes the issue.
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/namespaces.py#L91-L107
The following is the sysctl's after failover
---
root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 0
root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv4.conf.all.arp_ignore
net.ipv4.conf.all.arp_ignore = 0
root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv4.conf.all.arp_announce
net.ipv4.conf.all.arp_announce = 0
root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv6.conf.all.forwarding
net.ipv6.conf.all.forwarding = 1
root@gtw03:~#
---
We are believe this caused by the following commits which only does initialization when neutron-l3-agent starts.
https://github.com/openstack/neutron/commit/9d5e80e935049d08e0fcefc0c823fb67c793a51b
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1881995
Title:
Centralized SNAT failover does not recover until "systemctl restart
neutron-l3-agent" on transferred node
Status in neutron:
New
Bug description:
**Environment**
Queens
OVSGTW DVR Mode: dvr_snat
CMP DVR Mode: dvr
No L3 HA
Use Case: Centralized FIPs (aka Floating IPs agains unbound ports)
https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/presentation-media/Neutron-Port-Binding-and-Impact-of-unbound-ports-on-DVR-Routers-with-FloatingIP.pdf
**How to reproduce**
1. Create normally a VM
2. Create allowed-pair port against the VM port
openstack port list --server <server_name> # Get port id
openstack port create --security-group <sec_group> --fixed-ip subnet=<subnet>,ip-address=<ip_address> --network <network name> <port name>
openstack port set --allowed-address ip-address=<ip_address> <server port>
3. Assign floating ip to the port
openstack floating ip set --port <port_name> <floating_ip>
4. Inside the deployed VM create IP alias for the new ip address
ip addr add <ip_address>/24 dev ens3
5. Detect which gtw node is hosting the centralized fip
neutron l3-agent-list-hosting-router <router>
6. Perform manual failover
neutron l3-agent-router-remove <hosting-l3-agent> <router>
neutron l3-agent-router-add <new-l3-agent> <router>
(Or) Perform automatic failover
shutdown -h now (on hosting gtw)
7. Detect failover happened on new node
neutron l3-agent-list-hosting-router <router>
**Expected Result**
Connection to floating ip address recovers automatically
**Actual Result**
Connection does not recover. Reoccurrence is 100%
**How to recover**
Perform "neutron-l3-agent" restart on hosting node (after failover).
Recovers within few seconds.
systemctl restart neutron-l3-agent
**Additional information**
After failover the SNAT namespace does not include the sysctl rules
that should be added upon namespace creation. We have also confirmed
that fixing them manually also fixes the issue.
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/namespaces.py#L91-L107
The following is the sysctl's after failover
---
root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 0
root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv4.conf.all.arp_ignore
net.ipv4.conf.all.arp_ignore = 0
root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv4.conf.all.arp_announce
net.ipv4.conf.all.arp_announce = 0
root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl net.ipv6.conf.all.forwarding
net.ipv6.conf.all.forwarding = 1
root@gtw03:~#
---
We are believe this caused by the following commits which only does initialization when neutron-l3-agent starts.
https://github.com/openstack/neutron/commit/9d5e80e935049d08e0fcefc0c823fb67c793a51b
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1881995/+subscriptions