← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2025341] [NEW] flows lost with noop firewall driver at ovs-agent restart while the db is down

 

Public bug reported:

If we restart ovs-agent while neutron-server is up but neutron DB is
down, then the agent deletes and cannot recover the per-port flows, if
we also use the noop firewall driver. Because the affected flows include
the mod_vlan_vid flows this means traffic loss until another agent
restart (with the db up) or a full successful resync happens.

For example:

[securitygroup]
firewall_driver = noop

openstack server delete vm0 --wait
openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait

sudo ovs-ofctl dump-flows br-int > ~/noop-db-stop.1

# execute these by hand and make sure that each command took effect before moving on to the next
sudo systemctl stop mysql
sudo systemctl restart devstack@q-agt

sudo ovs-ofctl dump-flows br-int > ~/noop-db-stop.2

# diff the flows (for the sake of simplicity this devstack environment has a single vm with a single port, started above)
a=1 ; b=2 ; base=noop-db-stop. ; colordiff -u <( cat ~/$base$a | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort )

--- /dev/fd/63  2023-06-29 08:10:00.142623814 +0000
+++ /dev/fd/62  2023-06-29 08:10:00.142623814 +0000
@@ -1,19 +1,10 @@
 table=0 priority=0 actions=resubmit(,58)
-table=0 priority=10,arp,in_port=12 actions=resubmit(,24)
-table=0 priority=10,icmp6,in_port=12,icmp_type=136 actions=resubmit(,24)
 table=0 priority=200,reg3=0 actions=set_queue:0,load:0x1->NXM_NX_REG3[0],resubmit(,0)
 table=0 priority=2,in_port=1 actions=drop
 table=0 priority=2,in_port=2 actions=drop
-table=0 priority=3,in_port=1,vlan_tci=0x0000/0x1fff actions=mod_vlan_vid:2,resubmit(,58)
-table=0 priority=3,in_port=2,dl_vlan=100 actions=mod_vlan_vid:3,resubmit(,58)
 table=0 priority=65535,dl_vlan=4095 actions=drop
-table=0 priority=9,in_port=12 actions=resubmit(,25)
 table=23 priority=0 actions=drop
 table=24 priority=0 actions=drop
-table=24 priority=2,arp,in_port=12,arp_spa=10.0.0.19 actions=resubmit(,25)
-table=24 priority=2,icmp6,in_port=12,icmp_type=136,nd_target=fd17:d094:5207:0:f816:3eff:fe8e:b23f actions=resubmit(,58)
-table=24 priority=2,icmp6,in_port=12,icmp_type=136,nd_target=fe80::f816:3eff:fe8e:b23f actions=resubmit(,58)
-table=25 priority=2,in_port=12,dl_src=fa:16:3e:8e:b2:3f actions=resubmit(,30)
 table=30 priority=0 actions=resubmit(,58)
 table=31 priority=0 actions=resubmit(,58)
 table=58 priority=0 actions=resubmit(,60)

The same loss of flows does not happen with the openvswitch firewall
driver:

[securitygroup]
firewall_driver = openvswitch

openstack server delete vm0 --wait
openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait

sudo ovs-ofctl dump-flows br-int > ~/openvswitch-db-stop.1

sudo systemctl stop mysql
sudo systemctl restart devstack@q-agt

sudo ovs-ofctl dump-flows br-int > ~/openvswitch-db-stop.2

a=1 ; b=2 ; base=openvswitch-db-stop. ; colordiff -u <( cat ~/$base$a |
egrep -v ^NXST_FLOW | sed -r -e
's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e
's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW
| sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^
]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort )

[no diff]

The same loss of flows does not happen either if neutron-server is down
while ovs-agent restarts:

[securitygroup]
firewall_driver = noop

openstack server delete vm0 --wait
openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait

sudo ovs-ofctl dump-flows br-int > ~/noop-server-stop.1

sudo systemctl stop devstack@q-svc
sudo systemctl restart devstack@q-agt

sudo ovs-ofctl dump-flows br-int > ~/noop-server-stop.2

a=1 ; b=2 ; base=noop-server-stop. ; colordiff -u <( cat ~/$base$a |
egrep -v ^NXST_FLOW | sed -r -e
's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e
's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW
| sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^
]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort )

[no diff]

devstack b10c0602
neutron 0c5d4b8728

I'll push a proposed fix soon.

** Affects: neutron
     Importance: Undecided
     Assignee: Bence Romsics (bence-romsics)
         Status: New


** Tags: ovs

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2025341

Title:
  flows lost with noop firewall driver at ovs-agent restart while the db
  is down

Status in neutron:
  New

Bug description:
  If we restart ovs-agent while neutron-server is up but neutron DB is
  down, then the agent deletes and cannot recover the per-port flows, if
  we also use the noop firewall driver. Because the affected flows
  include the mod_vlan_vid flows this means traffic loss until another
  agent restart (with the db up) or a full successful resync happens.

  For example:

  [securitygroup]
  firewall_driver = noop

  openstack server delete vm0 --wait
  openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait

  sudo ovs-ofctl dump-flows br-int > ~/noop-db-stop.1

  # execute these by hand and make sure that each command took effect before moving on to the next
  sudo systemctl stop mysql
  sudo systemctl restart devstack@q-agt

  sudo ovs-ofctl dump-flows br-int > ~/noop-db-stop.2

  # diff the flows (for the sake of simplicity this devstack environment has a single vm with a single port, started above)
  a=1 ; b=2 ; base=noop-db-stop. ; colordiff -u <( cat ~/$base$a | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort )

  --- /dev/fd/63  2023-06-29 08:10:00.142623814 +0000
  +++ /dev/fd/62  2023-06-29 08:10:00.142623814 +0000
  @@ -1,19 +1,10 @@
   table=0 priority=0 actions=resubmit(,58)
  -table=0 priority=10,arp,in_port=12 actions=resubmit(,24)
  -table=0 priority=10,icmp6,in_port=12,icmp_type=136 actions=resubmit(,24)
   table=0 priority=200,reg3=0 actions=set_queue:0,load:0x1->NXM_NX_REG3[0],resubmit(,0)
   table=0 priority=2,in_port=1 actions=drop
   table=0 priority=2,in_port=2 actions=drop
  -table=0 priority=3,in_port=1,vlan_tci=0x0000/0x1fff actions=mod_vlan_vid:2,resubmit(,58)
  -table=0 priority=3,in_port=2,dl_vlan=100 actions=mod_vlan_vid:3,resubmit(,58)
   table=0 priority=65535,dl_vlan=4095 actions=drop
  -table=0 priority=9,in_port=12 actions=resubmit(,25)
   table=23 priority=0 actions=drop
   table=24 priority=0 actions=drop
  -table=24 priority=2,arp,in_port=12,arp_spa=10.0.0.19 actions=resubmit(,25)
  -table=24 priority=2,icmp6,in_port=12,icmp_type=136,nd_target=fd17:d094:5207:0:f816:3eff:fe8e:b23f actions=resubmit(,58)
  -table=24 priority=2,icmp6,in_port=12,icmp_type=136,nd_target=fe80::f816:3eff:fe8e:b23f actions=resubmit(,58)
  -table=25 priority=2,in_port=12,dl_src=fa:16:3e:8e:b2:3f actions=resubmit(,30)
   table=30 priority=0 actions=resubmit(,58)
   table=31 priority=0 actions=resubmit(,58)
   table=58 priority=0 actions=resubmit(,60)

  The same loss of flows does not happen with the openvswitch firewall
  driver:

  [securitygroup]
  firewall_driver = openvswitch

  openstack server delete vm0 --wait
  openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait

  sudo ovs-ofctl dump-flows br-int > ~/openvswitch-db-stop.1

  sudo systemctl stop mysql
  sudo systemctl restart devstack@q-agt

  sudo ovs-ofctl dump-flows br-int > ~/openvswitch-db-stop.2

  a=1 ; b=2 ; base=openvswitch-db-stop. ; colordiff -u <( cat ~/$base$a
  | egrep -v ^NXST_FLOW | sed -r -e
  's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e
  's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v
  ^NXST_FLOW | sed -r -e
  's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e
  's/^ *//' -e 's/, +/ /g' | sort )

  [no diff]

  The same loss of flows does not happen either if neutron-server is
  down while ovs-agent restarts:

  [securitygroup]
  firewall_driver = noop

  openstack server delete vm0 --wait
  openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait

  sudo ovs-ofctl dump-flows br-int > ~/noop-server-stop.1

  sudo systemctl stop devstack@q-svc
  sudo systemctl restart devstack@q-agt

  sudo ovs-ofctl dump-flows br-int > ~/noop-server-stop.2

  a=1 ; b=2 ; base=noop-server-stop. ; colordiff -u <( cat ~/$base$a |
  egrep -v ^NXST_FLOW | sed -r -e
  's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e
  's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v
  ^NXST_FLOW | sed -r -e
  's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e
  's/^ *//' -e 's/, +/ /g' | sort )

  [no diff]

  devstack b10c0602
  neutron 0c5d4b8728

  I'll push a proposed fix soon.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2025341/+subscriptions