← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1709087] [NEW] ovs-fw does not reinstate GRE conntrack entry

 

Private bug reported:

We have VMs running GRE tunnels between them with OVSFW and SG
  implemented along with GRE conntrack helper loaded on the hypervisor.
  GRE works as expected but the tunnel breaks whenever there is a
  neutron ovs agent event causing some exception like the below AMQP
  timeouts or OVSFW port not found :

  AMQP Timeout :

  2017-04-07 19:07:03.001 5275 ERROR
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
  MessagingTimeout: Timed out waiting for a reply to message ID
  4035644808d24ce9aae65a6ee567021c
  2017-04-07 19:07:03.001 5275 ERROR
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
  2017-04-07 19:07:03.003 5275 WARNING oslo.service.loopingcall [-]
  Function
  'neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent.OVSNeutronAgent._report_state'
  run outlasted interval by 120.01 sec
  2017-04-07 19:07:03.041 5275 INFO
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-]
  Agent has just been revived. Doing a full sync.
  2017-04-07 19:07:06.747 5275 INFO
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
  [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] rpc_loop doing a
  full sync.
  2017-04-07 19:07:06.841 5275 INFO
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
  [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] Agent out of sync
  with plugin!

  OVSFWPortNOtFound:

  2017-03-30 18:31:05.048 5160 ERROR
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
  self.firewall.prepare_port_filter(device)
  2017-03-30 18:31:05.048 5160 ERROR
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
  "/openstack/venvs/neutron-14.0.5/lib/python2.7/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py",
  line 272, in prepare_port_filter
  2017-03-30 18:31:05.048 5160 ERROR
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
  of_port = self.get_or_create_ofport(port)
  2017-03-30 18:31:05.048 5160 ERROR
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
  "/openstack/venvs/neutron-14.0.5/lib/python2.7/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py",
  line 246, in get_or_create_ofport
  2017-03-30 18:31:05.048 5160 ERROR
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent raise
  OVSFWPortNotFound(port_id=port_id)
  2017-03-30 18:31:05.048 5160 ERROR
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
  OVSFWPortNotFound: Port 01f7c714-1828-4768-9810-a0ec25dd2b92 is not
  managed by this agent.
  2017-03-30 18:31:05.048 5160 ERROR
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
  2017-03-30 18:31:05.072 5160 INFO
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
  [req-db74f32b-5370-4a5f-86bf-935eba1490d0 - - - - -] Agent out of sync
  with plugin!

  The agent throws out of sync messages and starts to initialize neutron
  ports once again along with fresh SG rules.

  2017-04-07 19:07:07.110 5275 INFO neutron.agent.securitygroups_rpc
  [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] Preparing filters
  for devices set([u'4b14619f-3b9e-4103-b9d7-9c7e52c797d8'])
  2017-04-07 19:07:07.215 5275 ERROR
  neutron.agent.linux.openvswitch_firewall.firewall
  [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] Initializing port
  4b14619f-3b9e-4103-b9d7-9c7e52c797d8 that was already initialized.

  During this process, when it prepares new filters for all ports, its
  marking the conntrack entry for certain GRE connection(high traffic)
  as invalid.

  root@server:/var/log# conntrack -L -o extended -p gre -f ipv4
  ipv4 2 gre 47 178 src=1.1.1.203 dst=2.2.2.66 srckey=0x0 dstkey=0x0
  src=2.2.2.66 dst=1.1.1.203 srckey=0x0 dstkey=0x0 [ASSURED] mark=1
  zone=5 use=1
  ipv4 2 gre 47 179 src=5.5.5.104 dst=4.4.4.187 srckey=0x0 dstkey=0x0
  src=4.4.4.187 dst=5.5.5.104 srckey=0x0 dstkey=0x0 [ASSURED] mark=0
  zone=5 use=1

  And that connection state remains invalid, unless someone reboots the
  VM, or flushes the connection directly on the conntrack or through
  OVS.

  We have a blanket any protocol any port any IP SG rule during this
  scenario, we even tried adding specific rules to allow IP 47 for GRE.
  But nothing fixed this problem.

  Was checking for ovs-conntrack helper specific bugs and came across
  patchwork.ozlabs.org/patch/755615/ - is this bug being
  triggered in the above scenario ? Is this a bug in the ovs-fw code or
  this something on the ovs-conntrack implementation.

  OpenStack Version : Newton.
  Hypervisor OS : Ubuntu 16.04.2
  Kernel version : 4.4.0-70-generic
  OVS version : 2.6.1

** Affects: null-and-void
     Importance: Undecided
         Status: Invalid


** Tags: ovs-fw sg-fw

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1709087

Title:
  ovs-fw does not reinstate GRE conntrack entry

Status in NULL Project:
  Invalid

Bug description:
  We have VMs running GRE tunnels between them with OVSFW and SG
    implemented along with GRE conntrack helper loaded on the hypervisor.
    GRE works as expected but the tunnel breaks whenever there is a
    neutron ovs agent event causing some exception like the below AMQP
    timeouts or OVSFW port not found :

    AMQP Timeout :

    2017-04-07 19:07:03.001 5275 ERROR
    neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
    MessagingTimeout: Timed out waiting for a reply to message ID
    4035644808d24ce9aae65a6ee567021c
    2017-04-07 19:07:03.001 5275 ERROR
    neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
    2017-04-07 19:07:03.003 5275 WARNING oslo.service.loopingcall [-]
    Function
    'neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent.OVSNeutronAgent._report_state'
    run outlasted interval by 120.01 sec
    2017-04-07 19:07:03.041 5275 INFO
    neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-]
    Agent has just been revived. Doing a full sync.
    2017-04-07 19:07:06.747 5275 INFO
    neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
    [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] rpc_loop doing a
    full sync.
    2017-04-07 19:07:06.841 5275 INFO
    neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
    [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] Agent out of sync
    with plugin!

    OVSFWPortNOtFound:

    2017-03-30 18:31:05.048 5160 ERROR
    neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
    self.firewall.prepare_port_filter(device)
    2017-03-30 18:31:05.048 5160 ERROR
    neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
    "/openstack/venvs/neutron-14.0.5/lib/python2.7/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py",
    line 272, in prepare_port_filter
    2017-03-30 18:31:05.048 5160 ERROR
    neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
    of_port = self.get_or_create_ofport(port)
    2017-03-30 18:31:05.048 5160 ERROR
    neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
    "/openstack/venvs/neutron-14.0.5/lib/python2.7/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py",
    line 246, in get_or_create_ofport
    2017-03-30 18:31:05.048 5160 ERROR
    neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent raise
    OVSFWPortNotFound(port_id=port_id)
    2017-03-30 18:31:05.048 5160 ERROR
    neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
    OVSFWPortNotFound: Port 01f7c714-1828-4768-9810-a0ec25dd2b92 is not
    managed by this agent.
    2017-03-30 18:31:05.048 5160 ERROR
    neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
    2017-03-30 18:31:05.072 5160 INFO
    neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
    [req-db74f32b-5370-4a5f-86bf-935eba1490d0 - - - - -] Agent out of sync
    with plugin!

    The agent throws out of sync messages and starts to initialize neutron
    ports once again along with fresh SG rules.

    2017-04-07 19:07:07.110 5275 INFO neutron.agent.securitygroups_rpc
    [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] Preparing filters
    for devices set([u'4b14619f-3b9e-4103-b9d7-9c7e52c797d8'])
    2017-04-07 19:07:07.215 5275 ERROR
    neutron.agent.linux.openvswitch_firewall.firewall
    [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] Initializing port
    4b14619f-3b9e-4103-b9d7-9c7e52c797d8 that was already initialized.

    During this process, when it prepares new filters for all ports, its
    marking the conntrack entry for certain GRE connection(high traffic)
    as invalid.

    root@server:/var/log# conntrack -L -o extended -p gre -f ipv4
    ipv4 2 gre 47 178 src=1.1.1.203 dst=2.2.2.66 srckey=0x0 dstkey=0x0
    src=2.2.2.66 dst=1.1.1.203 srckey=0x0 dstkey=0x0 [ASSURED] mark=1
    zone=5 use=1
    ipv4 2 gre 47 179 src=5.5.5.104 dst=4.4.4.187 srckey=0x0 dstkey=0x0
    src=4.4.4.187 dst=5.5.5.104 srckey=0x0 dstkey=0x0 [ASSURED] mark=0
    zone=5 use=1

    And that connection state remains invalid, unless someone reboots the
    VM, or flushes the connection directly on the conntrack or through
    OVS.

    We have a blanket any protocol any port any IP SG rule during this
    scenario, we even tried adding specific rules to allow IP 47 for GRE.
    But nothing fixed this problem.

    Was checking for ovs-conntrack helper specific bugs and came across
    patchwork.ozlabs.org/patch/755615/ - is this bug being
    triggered in the above scenario ? Is this a bug in the ovs-fw code or
    this something on the ovs-conntrack implementation.

    OpenStack Version : Newton.
    Hypervisor OS : Ubuntu 16.04.2
    Kernel version : 4.4.0-70-generic
    OVS version : 2.6.1

To manage notifications about this bug go to:
https://bugs.launchpad.net/null-and-void/+bug/1709087/+subscriptions


Follow ups