yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #92553
[Bug 2025341] [NEW] flows lost with noop firewall driver at ovs-agent restart while the db is down
Public bug reported:
If we restart ovs-agent while neutron-server is up but neutron DB is
down, then the agent deletes and cannot recover the per-port flows, if
we also use the noop firewall driver. Because the affected flows include
the mod_vlan_vid flows this means traffic loss until another agent
restart (with the db up) or a full successful resync happens.
For example:
[securitygroup]
firewall_driver = noop
openstack server delete vm0 --wait
openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait
sudo ovs-ofctl dump-flows br-int > ~/noop-db-stop.1
# execute these by hand and make sure that each command took effect before moving on to the next
sudo systemctl stop mysql
sudo systemctl restart devstack@q-agt
sudo ovs-ofctl dump-flows br-int > ~/noop-db-stop.2
# diff the flows (for the sake of simplicity this devstack environment has a single vm with a single port, started above)
a=1 ; b=2 ; base=noop-db-stop. ; colordiff -u <( cat ~/$base$a | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort )
--- /dev/fd/63 2023-06-29 08:10:00.142623814 +0000
+++ /dev/fd/62 2023-06-29 08:10:00.142623814 +0000
@@ -1,19 +1,10 @@
table=0 priority=0 actions=resubmit(,58)
-table=0 priority=10,arp,in_port=12 actions=resubmit(,24)
-table=0 priority=10,icmp6,in_port=12,icmp_type=136 actions=resubmit(,24)
table=0 priority=200,reg3=0 actions=set_queue:0,load:0x1->NXM_NX_REG3[0],resubmit(,0)
table=0 priority=2,in_port=1 actions=drop
table=0 priority=2,in_port=2 actions=drop
-table=0 priority=3,in_port=1,vlan_tci=0x0000/0x1fff actions=mod_vlan_vid:2,resubmit(,58)
-table=0 priority=3,in_port=2,dl_vlan=100 actions=mod_vlan_vid:3,resubmit(,58)
table=0 priority=65535,dl_vlan=4095 actions=drop
-table=0 priority=9,in_port=12 actions=resubmit(,25)
table=23 priority=0 actions=drop
table=24 priority=0 actions=drop
-table=24 priority=2,arp,in_port=12,arp_spa=10.0.0.19 actions=resubmit(,25)
-table=24 priority=2,icmp6,in_port=12,icmp_type=136,nd_target=fd17:d094:5207:0:f816:3eff:fe8e:b23f actions=resubmit(,58)
-table=24 priority=2,icmp6,in_port=12,icmp_type=136,nd_target=fe80::f816:3eff:fe8e:b23f actions=resubmit(,58)
-table=25 priority=2,in_port=12,dl_src=fa:16:3e:8e:b2:3f actions=resubmit(,30)
table=30 priority=0 actions=resubmit(,58)
table=31 priority=0 actions=resubmit(,58)
table=58 priority=0 actions=resubmit(,60)
The same loss of flows does not happen with the openvswitch firewall
driver:
[securitygroup]
firewall_driver = openvswitch
openstack server delete vm0 --wait
openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait
sudo ovs-ofctl dump-flows br-int > ~/openvswitch-db-stop.1
sudo systemctl stop mysql
sudo systemctl restart devstack@q-agt
sudo ovs-ofctl dump-flows br-int > ~/openvswitch-db-stop.2
a=1 ; b=2 ; base=openvswitch-db-stop. ; colordiff -u <( cat ~/$base$a |
egrep -v ^NXST_FLOW | sed -r -e
's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e
's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW
| sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^
]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort )
[no diff]
The same loss of flows does not happen either if neutron-server is down
while ovs-agent restarts:
[securitygroup]
firewall_driver = noop
openstack server delete vm0 --wait
openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait
sudo ovs-ofctl dump-flows br-int > ~/noop-server-stop.1
sudo systemctl stop devstack@q-svc
sudo systemctl restart devstack@q-agt
sudo ovs-ofctl dump-flows br-int > ~/noop-server-stop.2
a=1 ; b=2 ; base=noop-server-stop. ; colordiff -u <( cat ~/$base$a |
egrep -v ^NXST_FLOW | sed -r -e
's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e
's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW
| sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^
]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort )
[no diff]
devstack b10c0602
neutron 0c5d4b8728
I'll push a proposed fix soon.
** Affects: neutron
Importance: Undecided
Assignee: Bence Romsics (bence-romsics)
Status: New
** Tags: ovs
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2025341
Title:
flows lost with noop firewall driver at ovs-agent restart while the db
is down
Status in neutron:
New
Bug description:
If we restart ovs-agent while neutron-server is up but neutron DB is
down, then the agent deletes and cannot recover the per-port flows, if
we also use the noop firewall driver. Because the affected flows
include the mod_vlan_vid flows this means traffic loss until another
agent restart (with the db up) or a full successful resync happens.
For example:
[securitygroup]
firewall_driver = noop
openstack server delete vm0 --wait
openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait
sudo ovs-ofctl dump-flows br-int > ~/noop-db-stop.1
# execute these by hand and make sure that each command took effect before moving on to the next
sudo systemctl stop mysql
sudo systemctl restart devstack@q-agt
sudo ovs-ofctl dump-flows br-int > ~/noop-db-stop.2
# diff the flows (for the sake of simplicity this devstack environment has a single vm with a single port, started above)
a=1 ; b=2 ; base=noop-db-stop. ; colordiff -u <( cat ~/$base$a | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort )
--- /dev/fd/63 2023-06-29 08:10:00.142623814 +0000
+++ /dev/fd/62 2023-06-29 08:10:00.142623814 +0000
@@ -1,19 +1,10 @@
table=0 priority=0 actions=resubmit(,58)
-table=0 priority=10,arp,in_port=12 actions=resubmit(,24)
-table=0 priority=10,icmp6,in_port=12,icmp_type=136 actions=resubmit(,24)
table=0 priority=200,reg3=0 actions=set_queue:0,load:0x1->NXM_NX_REG3[0],resubmit(,0)
table=0 priority=2,in_port=1 actions=drop
table=0 priority=2,in_port=2 actions=drop
-table=0 priority=3,in_port=1,vlan_tci=0x0000/0x1fff actions=mod_vlan_vid:2,resubmit(,58)
-table=0 priority=3,in_port=2,dl_vlan=100 actions=mod_vlan_vid:3,resubmit(,58)
table=0 priority=65535,dl_vlan=4095 actions=drop
-table=0 priority=9,in_port=12 actions=resubmit(,25)
table=23 priority=0 actions=drop
table=24 priority=0 actions=drop
-table=24 priority=2,arp,in_port=12,arp_spa=10.0.0.19 actions=resubmit(,25)
-table=24 priority=2,icmp6,in_port=12,icmp_type=136,nd_target=fd17:d094:5207:0:f816:3eff:fe8e:b23f actions=resubmit(,58)
-table=24 priority=2,icmp6,in_port=12,icmp_type=136,nd_target=fe80::f816:3eff:fe8e:b23f actions=resubmit(,58)
-table=25 priority=2,in_port=12,dl_src=fa:16:3e:8e:b2:3f actions=resubmit(,30)
table=30 priority=0 actions=resubmit(,58)
table=31 priority=0 actions=resubmit(,58)
table=58 priority=0 actions=resubmit(,60)
The same loss of flows does not happen with the openvswitch firewall
driver:
[securitygroup]
firewall_driver = openvswitch
openstack server delete vm0 --wait
openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait
sudo ovs-ofctl dump-flows br-int > ~/openvswitch-db-stop.1
sudo systemctl stop mysql
sudo systemctl restart devstack@q-agt
sudo ovs-ofctl dump-flows br-int > ~/openvswitch-db-stop.2
a=1 ; b=2 ; base=openvswitch-db-stop. ; colordiff -u <( cat ~/$base$a
| egrep -v ^NXST_FLOW | sed -r -e
's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e
's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v
^NXST_FLOW | sed -r -e
's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e
's/^ *//' -e 's/, +/ /g' | sort )
[no diff]
The same loss of flows does not happen either if neutron-server is
down while ovs-agent restarts:
[securitygroup]
firewall_driver = noop
openstack server delete vm0 --wait
openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait
sudo ovs-ofctl dump-flows br-int > ~/noop-server-stop.1
sudo systemctl stop devstack@q-svc
sudo systemctl restart devstack@q-agt
sudo ovs-ofctl dump-flows br-int > ~/noop-server-stop.2
a=1 ; b=2 ; base=noop-server-stop. ; colordiff -u <( cat ~/$base$a |
egrep -v ^NXST_FLOW | sed -r -e
's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e
's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v
^NXST_FLOW | sed -r -e
's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e
's/^ *//' -e 's/, +/ /g' | sort )
[no diff]
devstack b10c0602
neutron 0c5d4b8728
I'll push a proposed fix soon.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2025341/+subscriptions