yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #88233
[Bug 1960405] [NEW] snat is used instead of dnat_and_snat when L3GW resides on chassis
Public bug reported:
I run RHOSP 16.2 (Train) with OVN and enable_distributed_floating_ip.
On the same chassis I have a VM running and the L3GW port scheduled there, and a FIP is associated to the VM.
I would expect the "dnat_and_snat" NAT to be used and traffic to egress with the FIP. However, as the L3GW is scheduled there too, I see the "snat" NAT is used instead.
I think this is a bug (unless I'm wrong)
- a VM having a FIP should use this FIP for egress traffic. External firewall expect it
- the L3GW port is expected to move. If the port moves to a chassis where the traffic is already flowing using the FIP, the presence of the L3GW port should not disrupt the traffic.
* Reproduction steps
We assume 2 chassis, cpu34d and cpu35d.
# Create a router, we make sure its port is scheduled on cpu35d:
openstack router create router1 --availability-zone-hint a35
openstack router set --external-gateway external1 --fixed-ip subnet=tenant_35 router1
openstack port show 34ef841f-545f-4fab-9447-11bf18ae0e1a \
-c binding_host_id -c device_owner -c fixed_ips
+-----------------+------------------------------------------------------------------------------+
| Field | Value |
+-----------------+------------------------------------------------------------------------------+
| binding_host_id | cpu35d
| device_owner | network:router_gateway |
| fixed_ips | ip_address='10.64.245.126', subnet_id='e955b866-324d-491f-888c-2760b713d3b0' |
+-----------------+------------------------------------------------------------------------------+
openstack router add subnet router1 mysub
# We run a VM on cpu35d with floating IP 10.64.254.128 associated
openstack server create myserver --key-name stack --security-group prodlike --network private \
--image cirros --flavor m1.small --availability-zone nova:cpu35d
openstack server add floating ip myserver 10.64.254.128
ssh cirros@10.64.254.128
ping external
# We run another VM, on the other chassis:
openstack server create myserver2 --key-name stack --security-group prodlike --network private \
--image cirros --flavor m1.small --availability-zone nova:cpu34d
openstack server add floating ip myserver2 10.64.254.135
ssh cirros@10.64.254.135
ping external
We observe the egress traffic:
* Expected output: what did you hope to see?
from myserver: traffic coming from the fip:
IP 10.64.254.128 > external
from myserver2:
IP 10.64.254.135 > external
* Actual output:
from myserver: traffic coming from the L3GW port:
IP 10.64.245.126 > external
from myserver2:
IP 10.64.254.135 > external
* Version:
** OpenStack version RHOSP 16.2 (Train)
** RHEL 8.4
** deployed with tripleo (ovn-2021-central-21.09.1-20.el8fdp.x86_64, python3-neutron-15.3.5-2.20210608154816.el8ost.4.noarch
>From OVN northdb:
router 6a1c6c1d-e365-4684-96d6-9b06e4ad5862 (neutron-abf4070d-6134-4bf8-b398-a9e201b66b08) (aka router1)
port lrp-34ef841f-545f-4fab-9447-11bf18ae0e1a
mac: "fa:16:3e:22:3f:c0"
networks: ["10.64.245.126/27"]
gateway chassis: [1126ea9a-2860-4e5c-9ab5-ca1e8959edee]
port lrp-e56e108f-d731-45e1-ba45-444219572859
mac: "fa:16:3e:90:a7:3b"
networks: ["192.168.200.1/27"]
nat 8e72663f-2c9d-49fa-9749-223df298c646
external ip: "10.64.254.128"
logical ip: "192.168.200.21"
type: "dnat_and_snat"
nat b0d9f69a-c8d4-4413-8f00-1c6f0b9e643f
external ip: "10.64.245.126"
logical ip: "192.168.200.0/27"
type: "snat"
nat f5808a3a-f40c-4ccf-a9fe-b83209011555
external ip: "10.64.254.135"
logical ip: "192.168.200.27"
type: "dnat_and_snat"
>From ovn-trace we read:
ingress(dp="router1", inport="lrp-e56e10")
------------------------------------------
0. lr_in_admission (northd.c:10285): eth.dst == fa:16:3e:90:a7:3b && inport == "lrp-e56e10", priority 50, uuid 3711664a
xreg0[0..47] = fa:16:3e:90:a7:3b;
next;
1. lr_in_lookup_neighbor (northd.c:10365): 1, priority 0, uuid a7f32214
reg9[2] = 1;
next;
2. lr_in_learn_neighbor (northd.c:10374): reg9[2] == 1, priority 100, uuid bb51e95e
next;
10. lr_in_ip_routing (northd.c:9179): ip4.dst == 0.0.0.0/0, priority 1, uuid 57ef0971
ip.ttl--;
reg8[0..15] = 0;
reg0 = 10.64.245.97;
reg1 = 10.64.245.126;
eth.src = fa:16:3e:22:3f:c0;
outport = "lrp-34ef84";
flags.loopback = 1;
next;
11. lr_in_ip_routing_ecmp (northd.c:10670): reg8[0..15] == 0, priority 150, uuid 9b75d8f6
next;
12. lr_in_policy (northd.c:10795): 1, priority 0, uuid a68ffb22
reg8[0..15] = 0;
next;
13. lr_in_policy_ecmp (northd.c:10797): reg8[0..15] == 0, priority 150, uuid c2a799d9
next;
14. lr_in_arp_resolve (northd.c:10831): ip4, priority 0, uuid 3dff7cbd
get_arp(outport, reg0);
/* MAC binding to 00:1c:73:00:00:11. */
next;
17. lr_in_gw_redirect (northd.c:12774): ip4.src == 192.168.200.21 && outport == "lrp-34ef84" && is_chassis_resident("464051"), priority 100, uuid f142bc57
eth.src = fa:16:3e:c3:29:10;
reg1 = 10.64.254.128;
next;
18. lr_in_arp_request (northd.c:11488): 1, priority 0, uuid 39a08290
output;
egress(dp="router1", inport="lrp-e56e10", outport="lrp-34ef84")
---------------------------------------------------------------
0. lr_out_undnat (northd.c:12318): ip && ip4.src == 192.168.200.21 && outport == "lrp-34ef84", priority 100, uuid 8d276994
eth.src = fa:16:3e:c3:29:10;
ct_dnat;
ct_dnat /* assuming no un-dnat entry, so no change */
-----------------------------------------------------
2. lr_out_snat (northd.c:12410): ip && ip4.src == 192.168.200.0/27 && outport == "lrp-34ef84" && is_chassis_resident("cr-lrp-34ef84"), priority 156, uuid 7a29e2cf
ct_snat(10.64.245.126);
ct_snat(ip4.src=10.64.245.126)
------------------------------
4. lr_out_delivery (northd.c:11536): outport == "lrp-34ef84", priority 100, uuid d00dd976
output;
/* output to "lrp-34ef84", type "patch" */
I think what is happening is is_chassis_resident("cr-lrp-34ef84") is
running in the end and override the source IP with its own.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1960405
Title:
snat is used instead of dnat_and_snat when L3GW resides on chassis
Status in neutron:
New
Bug description:
I run RHOSP 16.2 (Train) with OVN and enable_distributed_floating_ip.
On the same chassis I have a VM running and the L3GW port scheduled there, and a FIP is associated to the VM.
I would expect the "dnat_and_snat" NAT to be used and traffic to egress with the FIP. However, as the L3GW is scheduled there too, I see the "snat" NAT is used instead.
I think this is a bug (unless I'm wrong)
- a VM having a FIP should use this FIP for egress traffic. External firewall expect it
- the L3GW port is expected to move. If the port moves to a chassis where the traffic is already flowing using the FIP, the presence of the L3GW port should not disrupt the traffic.
* Reproduction steps
We assume 2 chassis, cpu34d and cpu35d.
# Create a router, we make sure its port is scheduled on cpu35d:
openstack router create router1 --availability-zone-hint a35
openstack router set --external-gateway external1 --fixed-ip subnet=tenant_35 router1
openstack port show 34ef841f-545f-4fab-9447-11bf18ae0e1a \
-c binding_host_id -c device_owner -c fixed_ips
+-----------------+------------------------------------------------------------------------------+
| Field | Value |
+-----------------+------------------------------------------------------------------------------+
| binding_host_id | cpu35d
| device_owner | network:router_gateway |
| fixed_ips | ip_address='10.64.245.126', subnet_id='e955b866-324d-491f-888c-2760b713d3b0' |
+-----------------+------------------------------------------------------------------------------+
openstack router add subnet router1 mysub
# We run a VM on cpu35d with floating IP 10.64.254.128 associated
openstack server create myserver --key-name stack --security-group prodlike --network private \
--image cirros --flavor m1.small --availability-zone nova:cpu35d
openstack server add floating ip myserver 10.64.254.128
ssh cirros@10.64.254.128
ping external
# We run another VM, on the other chassis:
openstack server create myserver2 --key-name stack --security-group prodlike --network private \
--image cirros --flavor m1.small --availability-zone nova:cpu34d
openstack server add floating ip myserver2 10.64.254.135
ssh cirros@10.64.254.135
ping external
We observe the egress traffic:
* Expected output: what did you hope to see?
from myserver: traffic coming from the fip:
IP 10.64.254.128 > external
from myserver2:
IP 10.64.254.135 > external
* Actual output:
from myserver: traffic coming from the L3GW port:
IP 10.64.245.126 > external
from myserver2:
IP 10.64.254.135 > external
* Version:
** OpenStack version RHOSP 16.2 (Train)
** RHEL 8.4
** deployed with tripleo (ovn-2021-central-21.09.1-20.el8fdp.x86_64, python3-neutron-15.3.5-2.20210608154816.el8ost.4.noarch
From OVN northdb:
router 6a1c6c1d-e365-4684-96d6-9b06e4ad5862 (neutron-abf4070d-6134-4bf8-b398-a9e201b66b08) (aka router1)
port lrp-34ef841f-545f-4fab-9447-11bf18ae0e1a
mac: "fa:16:3e:22:3f:c0"
networks: ["10.64.245.126/27"]
gateway chassis: [1126ea9a-2860-4e5c-9ab5-ca1e8959edee]
port lrp-e56e108f-d731-45e1-ba45-444219572859
mac: "fa:16:3e:90:a7:3b"
networks: ["192.168.200.1/27"]
nat 8e72663f-2c9d-49fa-9749-223df298c646
external ip: "10.64.254.128"
logical ip: "192.168.200.21"
type: "dnat_and_snat"
nat b0d9f69a-c8d4-4413-8f00-1c6f0b9e643f
external ip: "10.64.245.126"
logical ip: "192.168.200.0/27"
type: "snat"
nat f5808a3a-f40c-4ccf-a9fe-b83209011555
external ip: "10.64.254.135"
logical ip: "192.168.200.27"
type: "dnat_and_snat"
From ovn-trace we read:
ingress(dp="router1", inport="lrp-e56e10")
------------------------------------------
0. lr_in_admission (northd.c:10285): eth.dst == fa:16:3e:90:a7:3b && inport == "lrp-e56e10", priority 50, uuid 3711664a
xreg0[0..47] = fa:16:3e:90:a7:3b;
next;
1. lr_in_lookup_neighbor (northd.c:10365): 1, priority 0, uuid a7f32214
reg9[2] = 1;
next;
2. lr_in_learn_neighbor (northd.c:10374): reg9[2] == 1, priority 100, uuid bb51e95e
next;
10. lr_in_ip_routing (northd.c:9179): ip4.dst == 0.0.0.0/0, priority 1, uuid 57ef0971
ip.ttl--;
reg8[0..15] = 0;
reg0 = 10.64.245.97;
reg1 = 10.64.245.126;
eth.src = fa:16:3e:22:3f:c0;
outport = "lrp-34ef84";
flags.loopback = 1;
next;
11. lr_in_ip_routing_ecmp (northd.c:10670): reg8[0..15] == 0, priority 150, uuid 9b75d8f6
next;
12. lr_in_policy (northd.c:10795): 1, priority 0, uuid a68ffb22
reg8[0..15] = 0;
next;
13. lr_in_policy_ecmp (northd.c:10797): reg8[0..15] == 0, priority 150, uuid c2a799d9
next;
14. lr_in_arp_resolve (northd.c:10831): ip4, priority 0, uuid 3dff7cbd
get_arp(outport, reg0);
/* MAC binding to 00:1c:73:00:00:11. */
next;
17. lr_in_gw_redirect (northd.c:12774): ip4.src == 192.168.200.21 && outport == "lrp-34ef84" && is_chassis_resident("464051"), priority 100, uuid f142bc57
eth.src = fa:16:3e:c3:29:10;
reg1 = 10.64.254.128;
next;
18. lr_in_arp_request (northd.c:11488): 1, priority 0, uuid 39a08290
output;
egress(dp="router1", inport="lrp-e56e10", outport="lrp-34ef84")
---------------------------------------------------------------
0. lr_out_undnat (northd.c:12318): ip && ip4.src == 192.168.200.21 && outport == "lrp-34ef84", priority 100, uuid 8d276994
eth.src = fa:16:3e:c3:29:10;
ct_dnat;
ct_dnat /* assuming no un-dnat entry, so no change */
-----------------------------------------------------
2. lr_out_snat (northd.c:12410): ip && ip4.src == 192.168.200.0/27 && outport == "lrp-34ef84" && is_chassis_resident("cr-lrp-34ef84"), priority 156, uuid 7a29e2cf
ct_snat(10.64.245.126);
ct_snat(ip4.src=10.64.245.126)
------------------------------
4. lr_out_delivery (northd.c:11536): outport == "lrp-34ef84", priority 100, uuid d00dd976
output;
/* output to "lrp-34ef84", type "patch" */
I think what is happening is is_chassis_resident("cr-lrp-34ef84") is
running in the end and override the source IP with its own.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1960405/+subscriptions
Follow ups