← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1744412] [NEW] Losing access to instances via floating IPs

 

Public bug reported:

Description of problem:
Neutron Floating IPs stop working, instances become unreachable. 


Version-Release number of selected component (if applicable):
Ocata.

Neutron-related RPMs:
puppet-neutron-10.3.2-0.20180103174737.2e7d298.el7.centos.noarch
openstack-neutron-common-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openvswitch-ovn-common-2.6.1-10.1.git20161206.el7.x86_64
openstack-neutron-sriov-nic-agent-10.0.5-0.20180105192920.295c700.el7.centos.noarch
python-neutron-lib-1.1.0-1.el7.noarch
openvswitch-2.6.1-10.1.git20161206.el7.x86_64
python2-neutronclient-6.1.1-1.el7.noarch
python-openvswitch-2.6.1-10.1.git20161206.el7.noarch
openstack-neutron-lbaas-10.0.2-0.20180104200311.10771af.el7.centos.noarch
openvswitch-ovn-host-2.6.1-10.1.git20161206.el7.x86_64
python-neutron-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openvswitch-ovn-central-2.6.1-10.1.git20161206.el7.x86_64
openstack-neutron-metering-agent-10.0.5-0.20180105192920.295c700.el7.centos.noarch
python-neutron-lbaas-10.0.2-0.20180104200311.10771af.el7.centos.noarch
openstack-neutron-openvswitch-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openstack-neutron-ml2-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openstack-neutron-10.0.5-0.20180105192920.295c700.el7.centos.noarch

How reproducible:
Not sure. We have noticed over the past day or so several users complaining about unreachable instances. Not all VMs have this issue and it is not clear how connectivity was lost in the first place. 

Actual results:
In some cases, router is active on more than one controller, or router looks in the correct configuration but the qg-xxxx interface isn't NAT-ing the traffic to the qr-xxx interface. Iptables look correct. 

Expected results:
VMs reachable via FIP. 

Additional info:
Some ports appear to be stuck in 'BUILD' status, but not sure what is causing it. 

See this error in the logs:

2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info Traceback (most recent call last):
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 256, in call
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     return func(*args, **kwargs)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 1116, in process
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     self.process_external()
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 910, in process_external
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     self.update_fip_statuses(fip_statuses)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 926, in update_fip_statuses
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     self.agent.context, self.router_id, fip_statuses)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 125, in update_floatingip_statuses
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     router_id=router_id, fip_statuses=fip_statuses)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 151, in call
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     return self._original_context.call(ctxt, method, **kwargs)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 169, in call
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     retry=self.retry)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 97, in _send
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     timeout=timeout, retry=retry)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 566, in send
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     retry=retry)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 557, in _send
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     raise result
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info RemoteError: Remote error: TimeoutError QueuePool limit of size 10 overflow 20 reached, connection timed out, timeout 10

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1744412

Title:
  Losing access to instances via floating IPs

Status in neutron:
  New

Bug description:
  Description of problem:
  Neutron Floating IPs stop working, instances become unreachable. 

  
  Version-Release number of selected component (if applicable):
  Ocata.

  Neutron-related RPMs:
  puppet-neutron-10.3.2-0.20180103174737.2e7d298.el7.centos.noarch
  openstack-neutron-common-10.0.5-0.20180105192920.295c700.el7.centos.noarch
  openvswitch-ovn-common-2.6.1-10.1.git20161206.el7.x86_64
  openstack-neutron-sriov-nic-agent-10.0.5-0.20180105192920.295c700.el7.centos.noarch
  python-neutron-lib-1.1.0-1.el7.noarch
  openvswitch-2.6.1-10.1.git20161206.el7.x86_64
  python2-neutronclient-6.1.1-1.el7.noarch
  python-openvswitch-2.6.1-10.1.git20161206.el7.noarch
  openstack-neutron-lbaas-10.0.2-0.20180104200311.10771af.el7.centos.noarch
  openvswitch-ovn-host-2.6.1-10.1.git20161206.el7.x86_64
  python-neutron-10.0.5-0.20180105192920.295c700.el7.centos.noarch
  openvswitch-ovn-central-2.6.1-10.1.git20161206.el7.x86_64
  openstack-neutron-metering-agent-10.0.5-0.20180105192920.295c700.el7.centos.noarch
  python-neutron-lbaas-10.0.2-0.20180104200311.10771af.el7.centos.noarch
  openstack-neutron-openvswitch-10.0.5-0.20180105192920.295c700.el7.centos.noarch
  openstack-neutron-ml2-10.0.5-0.20180105192920.295c700.el7.centos.noarch
  openstack-neutron-10.0.5-0.20180105192920.295c700.el7.centos.noarch

  How reproducible:
  Not sure. We have noticed over the past day or so several users complaining about unreachable instances. Not all VMs have this issue and it is not clear how connectivity was lost in the first place. 

  Actual results:
  In some cases, router is active on more than one controller, or router looks in the correct configuration but the qg-xxxx interface isn't NAT-ing the traffic to the qr-xxx interface. Iptables look correct. 

  Expected results:
  VMs reachable via FIP. 

  Additional info:
  Some ports appear to be stuck in 'BUILD' status, but not sure what is causing it. 

  See this error in the logs:

  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info Traceback (most recent call last):
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 256, in call
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     return func(*args, **kwargs)
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 1116, in process
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     self.process_external()
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 910, in process_external
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     self.update_fip_statuses(fip_statuses)
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 926, in update_fip_statuses
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     self.agent.context, self.router_id, fip_statuses)
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 125, in update_floatingip_statuses
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     router_id=router_id, fip_statuses=fip_statuses)
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 151, in call
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     return self._original_context.call(ctxt, method, **kwargs)
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 169, in call
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     retry=self.retry)
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 97, in _send
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     timeout=timeout, retry=retry)
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 566, in send
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     retry=retry)
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 557, in _send
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info     raise result
  2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info RemoteError: Remote error: TimeoutError QueuePool limit of size 10 overflow 20 reached, connection timed out, timeout 10

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1744412/+subscriptions


Follow ups