← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1493945] [NEW] Router scheduling at network node fails under scale

 

Public bug reported:

After around 100 routers being scheduled to a neutron node, subsequent
schedulings fail with the following extracted signature:

38343:2015-09-09 06:53:15.305 mDEBUG neutron.agent.l3.agent [req-d7ce10e2-b689-4c5b-b4c7-30aa4f1fdbbb admin cdd316b857a947488ca9120aef5f6891m] Got routers updated notification :[u'54ffc2c4-123b-460b-bd2f-01ae5277e3e1'] from (pid=19102) routers_updated /opt/stack/neutron/neutron/agent/l3/agent.py:385
38448:2015-09-09 06:53:16.328 mDEBUG neutron.agent.l3.agent [req-63d36e16-5d5d-4575-825b-28722ec28a1e admin cdd316b857a947488ca9120aef5f6891m] Got routers updated notification :[u'54ffc2c4-123b-460b-bd2f-01ae5277e3e1'] from (pid=19102) routers_updated /opt/stack/neutron/neutron/agent/l3/agent.py:385
41013:2015-09-09 06:54:23.815 mDEBUG neutron.agent.l3.agent [-m] Starting router update for 54ffc2c4-123b-460b-bd2f-01ae5277e3e1, action None, priority 0 from (pid=19102) _process_router_update /opt/stack/neutron/neutron/agent/l3/agent.py:456
42690:2015-09-09 06:55:23.818 ERROR neutron.agent.l3.agent [-] Failed to fetch router information for '54ffc2c4-123b-460b-bd2f-01ae5277e3e1'
42710:2015-09-09 06:55:23.821 mDEBUG neutron.agent.l3.agent [-m] Starting router update for 54ffc2c4-123b-460b-bd2f-01ae5277e3e1, action None, priority 0 from (pid=19102) _process_router_update /opt/stack/neutron/neutron/agent/l3/agent.py:456
42738:2015-09-09 06:55:30.615 mDEBUG oslo_messaging._drivers.amqpdriver [-m]  queues: 8, message: {u'_unique_id': u'c3f0a880f9544bf8b938bb6ced4fee6f', u'failure': None, u'result': [{u'status': u'ACTIVE', u'_interfaces': [{u'allowed_address_pairs': [], u'extra_dhcp_opts': [], u'device_owner': u'network:router_interface', u'port_security_enabled': False, u'binding:profile': {}, u'fixed_ips': [{u'subnet_id': u'3d01720b-324d-4f69-8767-43705217aeb0', u'prefixlen': 24, u'ip_address': u'192.168.18.1'}], u'id': u'7ef5df56-e82b-4fb8-8b1c-836ec93338d3', u'security_groups': [], u'binding:vif_details': {}, u'binding:vif_type': u'unbound', u'mac_address': u'fa:16:3e:ee:9f:33', u'status': u'DOWN', u'subnets': [{u'ipv6_ra_mode': None, u'cidr': u'192.168.18.0/24', u'gateway_ip': u'192.168.18.1', u'id': u'3d01720b-324d-4f69-8767-43705217aeb0', u'subnetpool_id': None}], u'binding:host_id': u'legacy-network-1', u'device_id': u'54ffc2c4-123b-460b-bd2f-01ae5277e3e1', u'name': u'', u'admin_state_up': True, u'network_id': u'7a77e6c2-6e25-4223-9981-987f33e75d18', u'dns_name': u'', u'binding:vnic_type': u'normal', u'tenant_id': u'4cd3d0ecfa6f48bb946932481ef04b4e', u'extra_subnets': []}], u'enable_snat': True, u'ha_vr_id': 0, u'gw_port_host': None, u'gw_port_id': u'2a3dabbc-db24-40c5-880a-3ef738537520', u'admin_state_up': True, u'tenant_id': u'4cd3d0ecfa6f48bb946932481ef04b4e', u'gw_port': {u'allowed_address_pairs': [], u'extra_dhcp_opts': [], u'device_owner': u'network:router_gateway', u'port_security_enabled': False, u'binding:profile': {}, u'fixed_ips': [{u'subnet_id': u'd43deb2a-6bcd-40b2-b559-36a798e932ba', u'prefixlen': 20, u'ip_address': u'172.18.128.101'}], u'id': u'2a3dabbc-db24-40c5-880a-3ef738537520', u'security_groups': [], u'binding:vif_details': {}, u'binding:vif_type': u'unbound', u'mac_address': u'fa:16:3e:b5:fa:de', u'status': u'DOWN', u'subnets': [{u'ipv6_ra_mode': None, u'cidr': u'172.18.128.0/20', u'gateway_ip': u'172.18.128.1', u'id': u'd43deb2a-6bcd-40b2-b559-36a798e932ba', u'subnetpool_id': None}], u'binding:host_id': u'legacy-network-1', u'device_id': u'54ffc2c4-123b-460b-bd2f-01ae5277e3e1', u'name': u'', u'admin_state_up': True, u'network_id': u'c546009b-207c-44cd-8a4b-3e1e426eb56b', u'dns_name': u'', u'binding:vnic_type': u'normal', u'tenant_id': u'', u'extra_subnets': []}, u'distributed': False, u'_snat_router_interfaces': [], u'routes': [], u'external_gateway_info': {u'network_id': u'c546009b-207c-44cd-8a4b-3e1e426eb56b', u'enable_snat': True, u'external_fixed_ips': [{u'subnet_id': u'd43deb2a-6bcd-40b2-b559-36a798e932ba', u'ip_address': u'172.18.128.101'}]}, u'ha': False, u'id': u'54ffc2c4-123b-460b-bd2f-01ae5277e3e1', u'name': u'router-100'}]} from (pid=19102) put /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:230
42921:2015-09-09 06:56:23.824 ERROR neutron.agent.l3.agent [-] Failed to fetch router information for '54ffc2c4-123b-460b-bd2f-01ae5277e3e1'

The failure above comes from oslo_messaging timing out while getting
router information at line 465 in _process_router_update.  However, the
status of the now unscheduled router is still show as ACTIVE by the
neutron server, so no one will know about the failure.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1493945

Title:
  Router scheduling at network node fails under scale

Status in neutron:
  New

Bug description:
  After around 100 routers being scheduled to a neutron node, subsequent
  schedulings fail with the following extracted signature:

  38343:2015-09-09 06:53:15.305 mDEBUG neutron.agent.l3.agent [req-d7ce10e2-b689-4c5b-b4c7-30aa4f1fdbbb admin cdd316b857a947488ca9120aef5f6891m] Got routers updated notification :[u'54ffc2c4-123b-460b-bd2f-01ae5277e3e1'] from (pid=19102) routers_updated /opt/stack/neutron/neutron/agent/l3/agent.py:385
  38448:2015-09-09 06:53:16.328 mDEBUG neutron.agent.l3.agent [req-63d36e16-5d5d-4575-825b-28722ec28a1e admin cdd316b857a947488ca9120aef5f6891m] Got routers updated notification :[u'54ffc2c4-123b-460b-bd2f-01ae5277e3e1'] from (pid=19102) routers_updated /opt/stack/neutron/neutron/agent/l3/agent.py:385
  41013:2015-09-09 06:54:23.815 mDEBUG neutron.agent.l3.agent [-m] Starting router update for 54ffc2c4-123b-460b-bd2f-01ae5277e3e1, action None, priority 0 from (pid=19102) _process_router_update /opt/stack/neutron/neutron/agent/l3/agent.py:456
  42690:2015-09-09 06:55:23.818 ERROR neutron.agent.l3.agent [-] Failed to fetch router information for '54ffc2c4-123b-460b-bd2f-01ae5277e3e1'
  42710:2015-09-09 06:55:23.821 mDEBUG neutron.agent.l3.agent [-m] Starting router update for 54ffc2c4-123b-460b-bd2f-01ae5277e3e1, action None, priority 0 from (pid=19102) _process_router_update /opt/stack/neutron/neutron/agent/l3/agent.py:456
  42738:2015-09-09 06:55:30.615 mDEBUG oslo_messaging._drivers.amqpdriver [-m]  queues: 8, message: {u'_unique_id': u'c3f0a880f9544bf8b938bb6ced4fee6f', u'failure': None, u'result': [{u'status': u'ACTIVE', u'_interfaces': [{u'allowed_address_pairs': [], u'extra_dhcp_opts': [], u'device_owner': u'network:router_interface', u'port_security_enabled': False, u'binding:profile': {}, u'fixed_ips': [{u'subnet_id': u'3d01720b-324d-4f69-8767-43705217aeb0', u'prefixlen': 24, u'ip_address': u'192.168.18.1'}], u'id': u'7ef5df56-e82b-4fb8-8b1c-836ec93338d3', u'security_groups': [], u'binding:vif_details': {}, u'binding:vif_type': u'unbound', u'mac_address': u'fa:16:3e:ee:9f:33', u'status': u'DOWN', u'subnets': [{u'ipv6_ra_mode': None, u'cidr': u'192.168.18.0/24', u'gateway_ip': u'192.168.18.1', u'id': u'3d01720b-324d-4f69-8767-43705217aeb0', u'subnetpool_id': None}], u'binding:host_id': u'legacy-network-1', u'device_id': u'54ffc2c4-123b-460b-bd2f-01ae5277e3e1', u'name': u'', u'admin_state_up': True, u'network_id': u'7a77e6c2-6e25-4223-9981-987f33e75d18', u'dns_name': u'', u'binding:vnic_type': u'normal', u'tenant_id': u'4cd3d0ecfa6f48bb946932481ef04b4e', u'extra_subnets': []}], u'enable_snat': True, u'ha_vr_id': 0, u'gw_port_host': None, u'gw_port_id': u'2a3dabbc-db24-40c5-880a-3ef738537520', u'admin_state_up': True, u'tenant_id': u'4cd3d0ecfa6f48bb946932481ef04b4e', u'gw_port': {u'allowed_address_pairs': [], u'extra_dhcp_opts': [], u'device_owner': u'network:router_gateway', u'port_security_enabled': False, u'binding:profile': {}, u'fixed_ips': [{u'subnet_id': u'd43deb2a-6bcd-40b2-b559-36a798e932ba', u'prefixlen': 20, u'ip_address': u'172.18.128.101'}], u'id': u'2a3dabbc-db24-40c5-880a-3ef738537520', u'security_groups': [], u'binding:vif_details': {}, u'binding:vif_type': u'unbound', u'mac_address': u'fa:16:3e:b5:fa:de', u'status': u'DOWN', u'subnets': [{u'ipv6_ra_mode': None, u'cidr': u'172.18.128.0/20', u'gateway_ip': u'172.18.128.1', u'id': u'd43deb2a-6bcd-40b2-b559-36a798e932ba', u'subnetpool_id': None}], u'binding:host_id': u'legacy-network-1', u'device_id': u'54ffc2c4-123b-460b-bd2f-01ae5277e3e1', u'name': u'', u'admin_state_up': True, u'network_id': u'c546009b-207c-44cd-8a4b-3e1e426eb56b', u'dns_name': u'', u'binding:vnic_type': u'normal', u'tenant_id': u'', u'extra_subnets': []}, u'distributed': False, u'_snat_router_interfaces': [], u'routes': [], u'external_gateway_info': {u'network_id': u'c546009b-207c-44cd-8a4b-3e1e426eb56b', u'enable_snat': True, u'external_fixed_ips': [{u'subnet_id': u'd43deb2a-6bcd-40b2-b559-36a798e932ba', u'ip_address': u'172.18.128.101'}]}, u'ha': False, u'id': u'54ffc2c4-123b-460b-bd2f-01ae5277e3e1', u'name': u'router-100'}]} from (pid=19102) put /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:230
  42921:2015-09-09 06:56:23.824 ERROR neutron.agent.l3.agent [-] Failed to fetch router information for '54ffc2c4-123b-460b-bd2f-01ae5277e3e1'

  The failure above comes from oslo_messaging timing out while getting
  router information at line 465 in _process_router_update.  However,
  the status of the now unscheduled router is still show as ACTIVE by
  the neutron server, so no one will know about the failure.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1493945/+subscriptions


Follow ups