yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #25111
[Bug 1398988] [NEW] dhcp-agent / network bindings out of sync, stopping dhcp-agent
Public bug reported:
dhcp-agent on a neutron host started dying with
---
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent Traceback (most recent call last):
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/amqp.py", line 462, in _process_data
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent **args)
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 45, in dispatch
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent neutron_ctxt, version, method, namespace, **kwargs)
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent result = getattr(proxyobj, method)(ctxt, **kwargs)
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/dhcp_rpc_base.py", line 92, in get_active_networks_info
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent networks = self._get_active_networks(context, **kwargs)
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/dhcp_rpc_base.py", line 42, in _get_active_networks
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent plugin.auto_schedule_networks(context, host)
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/agentschedulers_db.py", line 222, in auto_schedule_networks
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent self.network_scheduler.auto_schedule_networks(self, context, host)
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/scheduler/dhcp_agent_scheduler.py", line 122, in auto_schedule_networks
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent context, [net_id], active=True)
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/agentschedulers_db.py", line 126, in get_dhcp_agents_hosting_networks
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent binding.dhcp_agent)]
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/agentschedulers_db.py", line 83, in is_eligible_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent agent['heartbeat_timestamp'])
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent TypeError: 'NoneType' object has no attribute '__getitem__'
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
---
further investigation on the neutron server found that
networkdhcpbindings seems to have got out of sync
we can see just one dhcp-agent
---
MariaDB [neutron]> select id from agents where agent_type="DHCP agent";
+--------------------------------------+
| id |
+--------------------------------------+
| 6923675d-5616-4ffe-b2c4-4d130f67973f |
+--------------------------------------+
1 row in set (0.00 sec)
---
but in the network bindings, at least 3 are listed
----
MariaDB [neutron]> select DISTINCT(dhcp_agent_id) from networkdhcpagentbindings;
+--------------------------------------+
| dhcp_agent_id |
+--------------------------------------+
| 6923675d-5616-4ffe-b2c4-4d130f67973f |
| b23f9f97-da04-4f61-bcfb-f8514e43cefd |
| d3e3ac5b-9962-428a-a9f8-6b2a1aba48d8 |
+--------------------------------------+
3 rows in set (0.00 sec)
---
neutron/db/agentschedulers_db.py:get_dhcp_agents_hosting_networks() [1]
doesn't expect this case of a binding with no associated agent, so it
goes off an tries to find the latest heartbeat for the agent, which
leads to the original traceback.
I unfortunately have no idea how this got out of sync; the host has been
upgraded from juno->kilo and neutron and dhcp-agents have been restarted
many, many times for various reasons.
I found two other references [2],[3]
[1] http://git.openstack.org/cgit/openstack/neutron/tree/neutron/db/agentschedulers_db.py#n107
[2] http://www.gossamer-threads.com/lists/openstack/dev/42060
[3] http://lists.openstack.org/pipermail/openstack/2013-October/002195.html
** Affects: neutron
Importance: Undecided
Assignee: Ian Wienand (iwienand)
Status: In Progress
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1398988
Title:
dhcp-agent / network bindings out of sync, stopping dhcp-agent
Status in OpenStack Neutron (virtual network service):
In Progress
Bug description:
dhcp-agent on a neutron host started dying with
---
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent Traceback (most recent call last):
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/amqp.py", line 462, in _process_data
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent **args)
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 45, in dispatch
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent neutron_ctxt, version, method, namespace, **kwargs)
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent result = getattr(proxyobj, method)(ctxt, **kwargs)
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/dhcp_rpc_base.py", line 92, in get_active_networks_info
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent networks = self._get_active_networks(context, **kwargs)
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/dhcp_rpc_base.py", line 42, in _get_active_networks
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent plugin.auto_schedule_networks(context, host)
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/agentschedulers_db.py", line 222, in auto_schedule_networks
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent self.network_scheduler.auto_schedule_networks(self, context, host)
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/scheduler/dhcp_agent_scheduler.py", line 122, in auto_schedule_networks
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent context, [net_id], active=True)
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/agentschedulers_db.py", line 126, in get_dhcp_agents_hosting_networks
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent binding.dhcp_agent)]
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/agentschedulers_db.py", line 83, in is_eligible_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent agent['heartbeat_timestamp'])
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent TypeError: 'NoneType' object has no attribute '__getitem__'
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent
---
further investigation on the neutron server found that
networkdhcpbindings seems to have got out of sync
we can see just one dhcp-agent
---
MariaDB [neutron]> select id from agents where agent_type="DHCP agent";
+--------------------------------------+
| id |
+--------------------------------------+
| 6923675d-5616-4ffe-b2c4-4d130f67973f |
+--------------------------------------+
1 row in set (0.00 sec)
---
but in the network bindings, at least 3 are listed
----
MariaDB [neutron]> select DISTINCT(dhcp_agent_id) from networkdhcpagentbindings;
+--------------------------------------+
| dhcp_agent_id |
+--------------------------------------+
| 6923675d-5616-4ffe-b2c4-4d130f67973f |
| b23f9f97-da04-4f61-bcfb-f8514e43cefd |
| d3e3ac5b-9962-428a-a9f8-6b2a1aba48d8 |
+--------------------------------------+
3 rows in set (0.00 sec)
---
neutron/db/agentschedulers_db.py:get_dhcp_agents_hosting_networks()
[1] doesn't expect this case of a binding with no associated agent, so
it goes off an tries to find the latest heartbeat for the agent, which
leads to the original traceback.
I unfortunately have no idea how this got out of sync; the host has
been upgraded from juno->kilo and neutron and dhcp-agents have been
restarted many, many times for various reasons.
I found two other references [2],[3]
[1] http://git.openstack.org/cgit/openstack/neutron/tree/neutron/db/agentschedulers_db.py#n107
[2] http://www.gossamer-threads.com/lists/openstack/dev/42060
[3] http://lists.openstack.org/pipermail/openstack/2013-October/002195.html
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1398988/+subscriptions
Follow ups
References