yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #70481
[Bug 1744412] [NEW] Losing access to instances via floating IPs
Public bug reported:
Description of problem:
Neutron Floating IPs stop working, instances become unreachable.
Version-Release number of selected component (if applicable):
Ocata.
Neutron-related RPMs:
puppet-neutron-10.3.2-0.20180103174737.2e7d298.el7.centos.noarch
openstack-neutron-common-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openvswitch-ovn-common-2.6.1-10.1.git20161206.el7.x86_64
openstack-neutron-sriov-nic-agent-10.0.5-0.20180105192920.295c700.el7.centos.noarch
python-neutron-lib-1.1.0-1.el7.noarch
openvswitch-2.6.1-10.1.git20161206.el7.x86_64
python2-neutronclient-6.1.1-1.el7.noarch
python-openvswitch-2.6.1-10.1.git20161206.el7.noarch
openstack-neutron-lbaas-10.0.2-0.20180104200311.10771af.el7.centos.noarch
openvswitch-ovn-host-2.6.1-10.1.git20161206.el7.x86_64
python-neutron-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openvswitch-ovn-central-2.6.1-10.1.git20161206.el7.x86_64
openstack-neutron-metering-agent-10.0.5-0.20180105192920.295c700.el7.centos.noarch
python-neutron-lbaas-10.0.2-0.20180104200311.10771af.el7.centos.noarch
openstack-neutron-openvswitch-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openstack-neutron-ml2-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openstack-neutron-10.0.5-0.20180105192920.295c700.el7.centos.noarch
How reproducible:
Not sure. We have noticed over the past day or so several users complaining about unreachable instances. Not all VMs have this issue and it is not clear how connectivity was lost in the first place.
Actual results:
In some cases, router is active on more than one controller, or router looks in the correct configuration but the qg-xxxx interface isn't NAT-ing the traffic to the qr-xxx interface. Iptables look correct.
Expected results:
VMs reachable via FIP.
Additional info:
Some ports appear to be stuck in 'BUILD' status, but not sure what is causing it.
See this error in the logs:
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info Traceback (most recent call last):
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 256, in call
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info return func(*args, **kwargs)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 1116, in process
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info self.process_external()
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 910, in process_external
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info self.update_fip_statuses(fip_statuses)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 926, in update_fip_statuses
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info self.agent.context, self.router_id, fip_statuses)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 125, in update_floatingip_statuses
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info router_id=router_id, fip_statuses=fip_statuses)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 151, in call
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info return self._original_context.call(ctxt, method, **kwargs)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 169, in call
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info retry=self.retry)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 97, in _send
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info timeout=timeout, retry=retry)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 566, in send
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info retry=retry)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 557, in _send
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info raise result
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info RemoteError: Remote error: TimeoutError QueuePool limit of size 10 overflow 20 reached, connection timed out, timeout 10
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1744412
Title:
Losing access to instances via floating IPs
Status in neutron:
New
Bug description:
Description of problem:
Neutron Floating IPs stop working, instances become unreachable.
Version-Release number of selected component (if applicable):
Ocata.
Neutron-related RPMs:
puppet-neutron-10.3.2-0.20180103174737.2e7d298.el7.centos.noarch
openstack-neutron-common-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openvswitch-ovn-common-2.6.1-10.1.git20161206.el7.x86_64
openstack-neutron-sriov-nic-agent-10.0.5-0.20180105192920.295c700.el7.centos.noarch
python-neutron-lib-1.1.0-1.el7.noarch
openvswitch-2.6.1-10.1.git20161206.el7.x86_64
python2-neutronclient-6.1.1-1.el7.noarch
python-openvswitch-2.6.1-10.1.git20161206.el7.noarch
openstack-neutron-lbaas-10.0.2-0.20180104200311.10771af.el7.centos.noarch
openvswitch-ovn-host-2.6.1-10.1.git20161206.el7.x86_64
python-neutron-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openvswitch-ovn-central-2.6.1-10.1.git20161206.el7.x86_64
openstack-neutron-metering-agent-10.0.5-0.20180105192920.295c700.el7.centos.noarch
python-neutron-lbaas-10.0.2-0.20180104200311.10771af.el7.centos.noarch
openstack-neutron-openvswitch-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openstack-neutron-ml2-10.0.5-0.20180105192920.295c700.el7.centos.noarch
openstack-neutron-10.0.5-0.20180105192920.295c700.el7.centos.noarch
How reproducible:
Not sure. We have noticed over the past day or so several users complaining about unreachable instances. Not all VMs have this issue and it is not clear how connectivity was lost in the first place.
Actual results:
In some cases, router is active on more than one controller, or router looks in the correct configuration but the qg-xxxx interface isn't NAT-ing the traffic to the qr-xxx interface. Iptables look correct.
Expected results:
VMs reachable via FIP.
Additional info:
Some ports appear to be stuck in 'BUILD' status, but not sure what is causing it.
See this error in the logs:
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info Traceback (most recent call last):
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 256, in call
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info return func(*args, **kwargs)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 1116, in process
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info self.process_external()
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 910, in process_external
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info self.update_fip_statuses(fip_statuses)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 926, in update_fip_statuses
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info self.agent.context, self.router_id, fip_statuses)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 125, in update_floatingip_statuses
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info router_id=router_id, fip_statuses=fip_statuses)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 151, in call
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info return self._original_context.call(ctxt, method, **kwargs)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 169, in call
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info retry=self.retry)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 97, in _send
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info timeout=timeout, retry=retry)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 566, in send
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info retry=retry)
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 557, in _send
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info raise result
2018-01-19 18:36:03.930 74015 ERROR neutron.agent.l3.router_info RemoteError: Remote error: TimeoutError QueuePool limit of size 10 overflow 20 reached, connection timed out, timeout 10
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1744412/+subscriptions
Follow ups