← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1831031] [NEW] Agent healthcheck: found 12 dead agents out of 24:

 

Public bug reported:

openstack Q version
centos7.5

Three control nodes, 15 compute nodes + ceph


agent, after_update _notify_loop /usr/lib/python2.7/site-packages/neutron_lib/callbacks/manager.py:167
2019-05-30 16:26:35.450 1929487 WARNING neutron.db.agents_db [req-b42eeb01-474b-459a-ad4e-7664ab35c15d - - - - -] Agent healthcheck: found 12 dead agents out of 24:
                Type       Last heartbeat host
  Open vSwitch agent  2019-05-30 08:25:12 compute08
      Metadata agent  2019-05-30 08:25:14 controller02
  Open vSwitch agent  2019-05-30 08:25:20 compute14
  Open vSwitch agent  2019-05-30 08:25:00 compute13
          DHCP agent  2019-05-30 08:25:08 controller03
  Open vSwitch agent  2019-05-30 08:25:09 compute05
  Open vSwitch agent  2019-05-30 08:24:56 compute11
  Open vSwitch agent  2019-05-30 08:24:56 compute07
  Open vSwitch agent  2019-05-30 08:25:19 compute02
  Open vSwitch agent  2019-05-30 08:25:15 compute09
  Open vSwitch agent  2019-05-30 08:25:03 compute10
  Open vSwitch agent  2019-05-30 08:25:18 compute15
  
  
[root@controller02 ~]# grep -nr "00bd797c" /var/log/neutron/
/var/log/neutron/server.log:44087:2019-05-30 15:42:26.521 1929477 DEBUG neutron.plugins.ml2.drivers.mech_agent [req-6f169ecb-103e-4ce7-82e2-32cfc9a0f040 3eaf2f2630d844448025e46d31ebc8f2 c4738e28a5164b919e7fd276e41fb765 - default default] Attempting to bind port 00bd797c-8c8d-43e7-9716-13e1cad60d62 on network e08820af-2de7-4389-811c-d589d311842e bind_port /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_agent.py:87
/var/log/neutron/server.log:44089:2019-05-30 15:42:26.527 1929477 WARNING neutron.plugins.ml2.drivers.mech_agent [req-6f169ecb-103e-4ce7-82e2-32cfc9a0f040 3eaf2f2630d844448025e46d31ebc8f2 c4738e28a5164b919e7fd276e41fb765 - default default] Refusing to bind port 00bd797c-8c8d-43e7-9716-13e1cad60d62 to dead agent: {'binary': u'neutron-openvswitch-agent', 'description': None, 'availability_zone': None, 'heartbeat_timestamp': datetime.datetime(2019, 5, 30, 7, 41, 10), 'admin_state_up': True, 'alive': False, 'topic': u'N/A', 'host': u'compute14', 'agent_type': u'Open vSwitch agent', 'resource_versions': {u'Subnet': u'1.0', u'Log': u'1.0', u'SubPort': u'1.0', u'SecurityGroup': u'1.0', u'SecurityGroupRule': u'1.0', u'Trunk': u'1.1', u'QosPolicy': u'1.7', u'Port': u'1.1', u'Network': u'1.0'}, 'created_at': datetime.datetime(2019, 5, 15, 7, 9, 1), 'started_at': datetime.datetime(2019, 5, 15, 7, 14, 9), 'id': '46170c00-d3f4-458d-8c61-89ca3f185f59', 'configurations': {u'ovs_hybrid_plug': True, u'in_distributed_mode': False, u'datapath_type': u'system', u'arp_responder_enabled': False, u'tunneling_ip': u'51.0.1.114', u'vhostuser_socket_dir': u'/var/run/openvswitch', u'devices': 10, u'ovs_capabilities': {u'datapath_types': [u'netdev', u'system'], u'iface_types': [u'geneve', u'gre', u'internal', u'lisp', u'patch', u'stt', u'system', u'tap', u'vxlan']}, u'extensions': [u'qos'], u'l2_population': True, u'tunnel_types': [u'vxlan'], u'log_agent_heartbeats': False, u'enable_distributed_routing': False, u'bridge_mappings': {u'provider': u'br-provider'}}}
/var/log/neutron/server.log:44090:2019-05-30 15:42:26.527 1929477 ERROR neutron.plugins.ml2.managers [req-6f169ecb-103e-4ce7-82e2-32cfc9a0f040 3eaf2f2630d844448025e46d31ebc8f2 c4738e28a5164b919e7fd276e41fb765 - default default] Failed to bind port 00bd797c-8c8d-43e7-9716-13e1cad60d62 on host compute14 for vnic_type normal using segments [{'network_id': 'e08820af-2de7-4389-811c-d589d311842e', 'segmentation_id': 139, 'physical_network': u'provider', 'id': 'e931dc8e-8719-49ae-ad44-e85df75b4af0', 'network_type': u'vlan'}]
/var/log/neutron/server.log:44091:2019-05-30 15:42:26.528 1929477 INFO neutron.plugins.ml2.plugin [req-6f169ecb-103e-4ce7-82e2-32cfc9a0f040 3eaf2f2630d844448025e46d31ebc8f2 c4738e28a5164b919e7fd276e41fb765 - default default] Attempt 9 to bind port 00bd797c-8c8d-43e7-9716-13e1cad60d62
/var/log/neutron/server.log:44092:2019-05-30 15:42:26.543 1929477 DEBUG neutron.plugins.ml2.managers [req-6f169ecb-103e-4ce7-82e2-32cfc9a0f040 3eaf2f2630d844448025e46d31ebc8f2 c4738e28a5164b919e7fd276e41fb765 - default default] Attempting to bind port 00bd797c-8c8d-43e7-9716-13e1cad60d62 on host compute14 for vnic_type normal with profile  bind_port /usr/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py:745
/var/log/neutron/server.log:44093:2019-05-30 15:42:26.543 1929477 DEBUG neutron.plugins.ml2.managers [req-6f169ecb-103e-4ce7-82e2-32cfc9a0f040 3eaf2f2630d844448025e46d31ebc8f2 c4738e28a5164b919e7fd276e41fb765 - default default] Attempting to bind port 00bd797c-8c8d-43e7-9716-13e1cad60d62 on host compute14 at level 0 using segments [{'network_id': 'e08820af-2de7-4389-811c-d589d311842e', 'segmentation_id': 139, 'physical_network': u'provider', 'id': 'e931dc8e-8719-49ae-ad44-e85df75b4af0', 'network_type': u'vlan'}] _bind_port_level /usr/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py:766

1、The reason is that the heartbeat detection agent of neutron-server
failed to create a port on this compute node ?


2、I use this command and constantly refresh the state is up(openstack) network agent list,Is this mechanism different from the heartbeat agent of neutron-server?
+--------------------------------------+--------------------+--------------+-------------------+-------+-------+---------------------------+
| ID                                   | Agent Type         | Host         | Availability Zone | Alive | State | Binary                    |
+--------------------------------------+--------------------+--------------+-------------------+-------+-------+---------------------------+
| 0aebdc75-8d1c-48c0-9033-5cb05f28cee3 | Open vSwitch agent | compute08    | None              | :-)   | UP    | neutron-openvswitch-agent |
| 196290f9-3812-4856-b248-b08a25830a5c | Metadata agent     | controller03 | None              | :-)   | UP    | neutron-metadata-agent    |
| 25545371-fceb-4c66-9436-58dc9b0f164c | Metadata agent     | controller02 | None              | :-)   | UP    | neutron-metadata-agent    |
| 377e08ba-4c50-4726-b767-b1f0950b51ad | Open vSwitch agent | controller02 | None              | :-)   | UP    | neutron-openvswitch-agent |
| 46170c00-d3f4-458d-8c61-89ca3f185f59 | Open vSwitch agent | compute14    | None              | :-)   | UP    | neutron-openvswitch-agent |
| 4cd107a2-f684-4933-8beb-fb46c3f811d4 | DHCP agent         | controller02 | nova              | :-)   | UP    | neutron-dhcp-agent        |
| 63c578b5-d317-4bc2-86d2-db5bfc0c2284 | Open vSwitch agent | compute12    | None              | :-)   | UP    | neutron-openvswitch-agent |
| 7883cf7d-c4aa-49d1-9bc5-f52df88c1eee | Metadata agent     | controller01 | None              | :-)   | UP    | neutron-metadata-agent    |
| 7c9bf133-c0df-42d3-bea7-bc1bfaac8072 | Open vSwitch agent | compute13    | None              | :-)   | UP    | neutron-openvswitch-agent |
| 81474816-cf83-452b-9881-d7408d39c0ec | Open vSwitch agent | compute04    | None              | :-)   | UP    | neutron-openvswitch-agent |
| 8498bbf0-62d4-48ca-bb8d-83288b1eb234 | DHCP agent         | controller03 | nova              | :-)   | UP    | neutron-dhcp-agent        |
| 9bfc8800-a521-4aed-961a-4f3923ddb058 | Open vSwitch agent | compute05    | None              | :-)   | UP    | neutron-openvswitch-agent |
| a75a7811-2a85-4281-9455-6ad9779eeb2f | Open vSwitch agent | compute11    | None              | :-)   | UP    | neutron-openvswitch-agent |
| abd84306-60b8-419f-aff0-475e9371931e | Open vSwitch agent | compute03    | None              | :-)   | UP    | neutron-openvswitch-agent |
| ad0e2207-86b4-4326-a4ef-13c0ae4f56fe | Open vSwitch agent | compute01    | None              | :-)   | UP    | neutron-openvswitch-agent |
| bac6584b-db92-41e4-b474-01065ea7449a | Open vSwitch agent | compute06    | None              | :-)   | UP    | neutron-openvswitch-agent |
| bdd6e96b-42dc-4b39-b9c0-48ac72471a11 | Open vSwitch agent | compute07    | None              | :-)   | UP    | neutron-openvswitch-agent |
| cb3a1ae1-1f0f-4964-bd6c-93361c6f9f0f | Open vSwitch agent | compute02    | None              | :-)   | UP    | neutron-openvswitch-agent |
| d0eff533-3a90-427a-b382-063801631459 | DHCP agent         | controller01 | nova              | :-)   | UP    | neutron-dhcp-agent        |
| d135cd88-0807-4040-be3b-53706ac6ec2e | Open vSwitch agent | compute09    | None              | :-)   | UP    | neutron-openvswitch-agent |
| e9a4767f-b220-4d01-95fe-d08f662eaca0 | Open vSwitch agent | controller01 | None              | :-)   | UP    | neutron-openvswitch-agent |
| f44f1619-0404-48e5-9469-2b3d2784dc55 | Open vSwitch agent | compute10    | None              | :-)   | UP    | neutron-openvswitch-agent |
| f7108761-8894-4ceb-918f-91b281d6ccb9 | Open vSwitch agent | compute15    | None              | :-)   | UP    | neutron-openvswitch-agent |
| f788a241-e9fe-4289-a237-95d3348b0e30 | Open vSwitch agent | controller03 | None              | :-)   | UP    | neutron-openvswitch-agent |
+--------------------------------------+--------------------+--------------+-------------------+-------+-------+---------------------------+

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1831031

Title:
  Agent healthcheck: found 12 dead agents out of 24:

Status in neutron:
  New

Bug description:
  openstack Q version
  centos7.5

  Three control nodes, 15 compute nodes + ceph


  
  agent, after_update _notify_loop /usr/lib/python2.7/site-packages/neutron_lib/callbacks/manager.py:167
  2019-05-30 16:26:35.450 1929487 WARNING neutron.db.agents_db [req-b42eeb01-474b-459a-ad4e-7664ab35c15d - - - - -] Agent healthcheck: found 12 dead agents out of 24:
                  Type       Last heartbeat host
    Open vSwitch agent  2019-05-30 08:25:12 compute08
        Metadata agent  2019-05-30 08:25:14 controller02
    Open vSwitch agent  2019-05-30 08:25:20 compute14
    Open vSwitch agent  2019-05-30 08:25:00 compute13
            DHCP agent  2019-05-30 08:25:08 controller03
    Open vSwitch agent  2019-05-30 08:25:09 compute05
    Open vSwitch agent  2019-05-30 08:24:56 compute11
    Open vSwitch agent  2019-05-30 08:24:56 compute07
    Open vSwitch agent  2019-05-30 08:25:19 compute02
    Open vSwitch agent  2019-05-30 08:25:15 compute09
    Open vSwitch agent  2019-05-30 08:25:03 compute10
    Open vSwitch agent  2019-05-30 08:25:18 compute15
    
    
  [root@controller02 ~]# grep -nr "00bd797c" /var/log/neutron/
  /var/log/neutron/server.log:44087:2019-05-30 15:42:26.521 1929477 DEBUG neutron.plugins.ml2.drivers.mech_agent [req-6f169ecb-103e-4ce7-82e2-32cfc9a0f040 3eaf2f2630d844448025e46d31ebc8f2 c4738e28a5164b919e7fd276e41fb765 - default default] Attempting to bind port 00bd797c-8c8d-43e7-9716-13e1cad60d62 on network e08820af-2de7-4389-811c-d589d311842e bind_port /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_agent.py:87
  /var/log/neutron/server.log:44089:2019-05-30 15:42:26.527 1929477 WARNING neutron.plugins.ml2.drivers.mech_agent [req-6f169ecb-103e-4ce7-82e2-32cfc9a0f040 3eaf2f2630d844448025e46d31ebc8f2 c4738e28a5164b919e7fd276e41fb765 - default default] Refusing to bind port 00bd797c-8c8d-43e7-9716-13e1cad60d62 to dead agent: {'binary': u'neutron-openvswitch-agent', 'description': None, 'availability_zone': None, 'heartbeat_timestamp': datetime.datetime(2019, 5, 30, 7, 41, 10), 'admin_state_up': True, 'alive': False, 'topic': u'N/A', 'host': u'compute14', 'agent_type': u'Open vSwitch agent', 'resource_versions': {u'Subnet': u'1.0', u'Log': u'1.0', u'SubPort': u'1.0', u'SecurityGroup': u'1.0', u'SecurityGroupRule': u'1.0', u'Trunk': u'1.1', u'QosPolicy': u'1.7', u'Port': u'1.1', u'Network': u'1.0'}, 'created_at': datetime.datetime(2019, 5, 15, 7, 9, 1), 'started_at': datetime.datetime(2019, 5, 15, 7, 14, 9), 'id': '46170c00-d3f4-458d-8c61-89ca3f185f59', 'configurations': {u'ovs_hybrid_plug': True, u'in_distributed_mode': False, u'datapath_type': u'system', u'arp_responder_enabled': False, u'tunneling_ip': u'51.0.1.114', u'vhostuser_socket_dir': u'/var/run/openvswitch', u'devices': 10, u'ovs_capabilities': {u'datapath_types': [u'netdev', u'system'], u'iface_types': [u'geneve', u'gre', u'internal', u'lisp', u'patch', u'stt', u'system', u'tap', u'vxlan']}, u'extensions': [u'qos'], u'l2_population': True, u'tunnel_types': [u'vxlan'], u'log_agent_heartbeats': False, u'enable_distributed_routing': False, u'bridge_mappings': {u'provider': u'br-provider'}}}
  /var/log/neutron/server.log:44090:2019-05-30 15:42:26.527 1929477 ERROR neutron.plugins.ml2.managers [req-6f169ecb-103e-4ce7-82e2-32cfc9a0f040 3eaf2f2630d844448025e46d31ebc8f2 c4738e28a5164b919e7fd276e41fb765 - default default] Failed to bind port 00bd797c-8c8d-43e7-9716-13e1cad60d62 on host compute14 for vnic_type normal using segments [{'network_id': 'e08820af-2de7-4389-811c-d589d311842e', 'segmentation_id': 139, 'physical_network': u'provider', 'id': 'e931dc8e-8719-49ae-ad44-e85df75b4af0', 'network_type': u'vlan'}]
  /var/log/neutron/server.log:44091:2019-05-30 15:42:26.528 1929477 INFO neutron.plugins.ml2.plugin [req-6f169ecb-103e-4ce7-82e2-32cfc9a0f040 3eaf2f2630d844448025e46d31ebc8f2 c4738e28a5164b919e7fd276e41fb765 - default default] Attempt 9 to bind port 00bd797c-8c8d-43e7-9716-13e1cad60d62
  /var/log/neutron/server.log:44092:2019-05-30 15:42:26.543 1929477 DEBUG neutron.plugins.ml2.managers [req-6f169ecb-103e-4ce7-82e2-32cfc9a0f040 3eaf2f2630d844448025e46d31ebc8f2 c4738e28a5164b919e7fd276e41fb765 - default default] Attempting to bind port 00bd797c-8c8d-43e7-9716-13e1cad60d62 on host compute14 for vnic_type normal with profile  bind_port /usr/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py:745
  /var/log/neutron/server.log:44093:2019-05-30 15:42:26.543 1929477 DEBUG neutron.plugins.ml2.managers [req-6f169ecb-103e-4ce7-82e2-32cfc9a0f040 3eaf2f2630d844448025e46d31ebc8f2 c4738e28a5164b919e7fd276e41fb765 - default default] Attempting to bind port 00bd797c-8c8d-43e7-9716-13e1cad60d62 on host compute14 at level 0 using segments [{'network_id': 'e08820af-2de7-4389-811c-d589d311842e', 'segmentation_id': 139, 'physical_network': u'provider', 'id': 'e931dc8e-8719-49ae-ad44-e85df75b4af0', 'network_type': u'vlan'}] _bind_port_level /usr/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py:766

  1、The reason is that the heartbeat detection agent of neutron-server
  failed to create a port on this compute node ?

  
  2、I use this command and constantly refresh the state is up(openstack) network agent list,Is this mechanism different from the heartbeat agent of neutron-server?
  +--------------------------------------+--------------------+--------------+-------------------+-------+-------+---------------------------+
  | ID                                   | Agent Type         | Host         | Availability Zone | Alive | State | Binary                    |
  +--------------------------------------+--------------------+--------------+-------------------+-------+-------+---------------------------+
  | 0aebdc75-8d1c-48c0-9033-5cb05f28cee3 | Open vSwitch agent | compute08    | None              | :-)   | UP    | neutron-openvswitch-agent |
  | 196290f9-3812-4856-b248-b08a25830a5c | Metadata agent     | controller03 | None              | :-)   | UP    | neutron-metadata-agent    |
  | 25545371-fceb-4c66-9436-58dc9b0f164c | Metadata agent     | controller02 | None              | :-)   | UP    | neutron-metadata-agent    |
  | 377e08ba-4c50-4726-b767-b1f0950b51ad | Open vSwitch agent | controller02 | None              | :-)   | UP    | neutron-openvswitch-agent |
  | 46170c00-d3f4-458d-8c61-89ca3f185f59 | Open vSwitch agent | compute14    | None              | :-)   | UP    | neutron-openvswitch-agent |
  | 4cd107a2-f684-4933-8beb-fb46c3f811d4 | DHCP agent         | controller02 | nova              | :-)   | UP    | neutron-dhcp-agent        |
  | 63c578b5-d317-4bc2-86d2-db5bfc0c2284 | Open vSwitch agent | compute12    | None              | :-)   | UP    | neutron-openvswitch-agent |
  | 7883cf7d-c4aa-49d1-9bc5-f52df88c1eee | Metadata agent     | controller01 | None              | :-)   | UP    | neutron-metadata-agent    |
  | 7c9bf133-c0df-42d3-bea7-bc1bfaac8072 | Open vSwitch agent | compute13    | None              | :-)   | UP    | neutron-openvswitch-agent |
  | 81474816-cf83-452b-9881-d7408d39c0ec | Open vSwitch agent | compute04    | None              | :-)   | UP    | neutron-openvswitch-agent |
  | 8498bbf0-62d4-48ca-bb8d-83288b1eb234 | DHCP agent         | controller03 | nova              | :-)   | UP    | neutron-dhcp-agent        |
  | 9bfc8800-a521-4aed-961a-4f3923ddb058 | Open vSwitch agent | compute05    | None              | :-)   | UP    | neutron-openvswitch-agent |
  | a75a7811-2a85-4281-9455-6ad9779eeb2f | Open vSwitch agent | compute11    | None              | :-)   | UP    | neutron-openvswitch-agent |
  | abd84306-60b8-419f-aff0-475e9371931e | Open vSwitch agent | compute03    | None              | :-)   | UP    | neutron-openvswitch-agent |
  | ad0e2207-86b4-4326-a4ef-13c0ae4f56fe | Open vSwitch agent | compute01    | None              | :-)   | UP    | neutron-openvswitch-agent |
  | bac6584b-db92-41e4-b474-01065ea7449a | Open vSwitch agent | compute06    | None              | :-)   | UP    | neutron-openvswitch-agent |
  | bdd6e96b-42dc-4b39-b9c0-48ac72471a11 | Open vSwitch agent | compute07    | None              | :-)   | UP    | neutron-openvswitch-agent |
  | cb3a1ae1-1f0f-4964-bd6c-93361c6f9f0f | Open vSwitch agent | compute02    | None              | :-)   | UP    | neutron-openvswitch-agent |
  | d0eff533-3a90-427a-b382-063801631459 | DHCP agent         | controller01 | nova              | :-)   | UP    | neutron-dhcp-agent        |
  | d135cd88-0807-4040-be3b-53706ac6ec2e | Open vSwitch agent | compute09    | None              | :-)   | UP    | neutron-openvswitch-agent |
  | e9a4767f-b220-4d01-95fe-d08f662eaca0 | Open vSwitch agent | controller01 | None              | :-)   | UP    | neutron-openvswitch-agent |
  | f44f1619-0404-48e5-9469-2b3d2784dc55 | Open vSwitch agent | compute10    | None              | :-)   | UP    | neutron-openvswitch-agent |
  | f7108761-8894-4ceb-918f-91b281d6ccb9 | Open vSwitch agent | compute15    | None              | :-)   | UP    | neutron-openvswitch-agent |
  | f788a241-e9fe-4289-a237-95d3348b0e30 | Open vSwitch agent | controller03 | None              | :-)   | UP    | neutron-openvswitch-agent |
  +--------------------------------------+--------------------+--------------+-------------------+-------+-------+---------------------------+

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1831031/+subscriptions


Follow ups