← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1708731] Re: ovs-fw does not reinstate GRE conntrack entry .

 

Reviewed:  https://review.openstack.org/540943
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6f7ba76075dd0d645ad6cee6854f87cc41cba1fa
Submitter: Zuul
Branch:    master

commit 6f7ba76075dd0d645ad6cee6854f87cc41cba1fa
Author: Jakub Libosvar <libosvar@xxxxxxxxxx>
Date:   Mon Feb 5 17:20:09 2018 +0000

    ovs-fw: Fix firewall blink
    
    Previously, when security group was updated for given port, the firewall
    removed all flows related to the port and added new rules. That
    introduced a time window where there were no rules for the port.
    
    This patch adds a new mechanism using cookie that can be described in
    three states:
    
    1) Create new openflow rules with non-default cookie that is considered
    an updated cookie. All newly generated flows will be added with the next
    cookie and all existing rules with default cookie are rewritten with the
    default cookie.
    2) Delete all rules for given port with the old default cookie. This
    will leave the newly added rules in place.
    3) Update the newly added flows with update cookie back to the default
    cookie in order to avoid such flows being cleaned on the next restart of
    ovs agent, as it fetches for stale flows.
    
    Change-Id: I85d9e49c24ee7c91229b43cd329c42149637f254
    Closes-bug: #1708731


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1708731

Title:
   ovs-fw does not reinstate GRE conntrack entry .

Status in neutron:
  Fix Released

Bug description:
   *High level description:*

  We have VMs running GRE tunnels between them with OVSFW and SG
  implemented along with GRE conntrack helper loaded on the hypervisor.
  GRE works as expected but the tunnel breaks whenever there is a
  neutron ovs agent event causing some exception like the below AMQP
  timeouts or OVSFW port not found :

  AMQP Timeout :

  2017-04-07 19:07:03.001 5275 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent MessagingTimeout: Timed out waiting for a reply to message ID 4035644808d24ce9aae65a6ee567021c
  2017-04-07 19:07:03.001 5275 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
  2017-04-07 19:07:03.003 5275 WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent.OVSNeutronAgent._report_state' run outlasted interval by 120.01 sec
  2017-04-07 19:07:03.041 5275 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Agent has just been revived. Doing a full sync.
  2017-04-07 19:07:06.747 5275 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] rpc_loop doing a full sync.
  2017-04-07 19:07:06.841 5275 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] Agent out of sync with plugin!

  OVSFWPortNOtFound:

  2017-03-30 18:31:05.048 5160 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     self.firewall.prepare_port_filter(device)
  2017-03-30 18:31:05.048 5160 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/openstack/venvs/neutron-14.0.5/lib/python2.7/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py", line 272, in prepare_port_filter
  2017-03-30 18:31:05.048 5160 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     of_port = self.get_or_create_ofport(port)
  2017-03-30 18:31:05.048 5160 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/openstack/venvs/neutron-14.0.5/lib/python2.7/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py", line 246, in get_or_create_ofport
  2017-03-30 18:31:05.048 5160 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     raise OVSFWPortNotFound(port_id=port_id)
  2017-03-30 18:31:05.048 5160 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent OVSFWPortNotFound: Port 01f7c714-1828-4768-9810-a0ec25dd2b92 is not managed by this agent.
  2017-03-30 18:31:05.048 5160 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
  2017-03-30 18:31:05.072 5160 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-db74f32b-5370-4a5f-86bf-935eba1490d0 - - - - -] Agent out of sync with plugin!

  
  The agent throws out of sync messages and starts to initialize neutron ports once again along with fresh SG rules.

  2017-04-07 19:07:07.110 5275 INFO neutron.agent.securitygroups_rpc [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] Preparing filters for devices set([u'4b14619f-3b9e-4103-b9d7-9c7e52c797d8'])
  2017-04-07 19:07:07.215 5275 ERROR neutron.agent.linux.openvswitch_firewall.firewall [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] Initializing port 4b14619f-3b9e-4103-b9d7-9c7e52c797d8 that was already initialized.

  During this process, when it prepares new filters for all ports, its
  marking the conntrack entry for certain GRE connection(high traffic)
  as invalid.

  root@server:/var/log# conntrack -L -o extended -p gre -f ipv4
  ipv4     2 gre      47 178 src=1.1.1.203 dst=2.2.2.66 srckey=0x0 dstkey=0x0 src=2.2.2.66 dst=1.1.1.203 srckey=0x0 dstkey=0x0 [ASSURED] mark=1 zone=5 use=1
  ipv4     2 gre      47 179 src=5.5.5.104 dst=4.4.4.187 srckey=0x0 dstkey=0x0 src=4.4.4.187 dst=5.5.5.104 srckey=0x0 dstkey=0x0 [ASSURED] mark=0 zone=5 use=1

  And that connection state remains invalid, unless someone reboots the
  VM, or flushes the connection directly on the conntrack or through
  OVS.

  We have a blanket any protocol any port any IP SG rule during this
  scenario, we even tried adding specific rules to allow IP 47 for GRE.
  But nothing fixed this problem.

  Was checking for ovs-conntrack helper specific bugs and came across
  https://patchwork.ozlabs.org/patch/755615/ - is this bug being
  triggered in the above scenario ? Is this a bug in the ovs-fw code or
  this something on the ovs-conntrack implementation.

  OpenStack Version : Newton.
  Hypervisor OS : Ubuntu 16.04.2
  Kernel version : 4.4.0-70-generic
  OVS version : 2.6.1

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1708731/+subscriptions


References