← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1776621] [NEW] Scale: when periodic pool size is small and there is a lot of load the compute service goes down

 

Public bug reported:

When the nova power sync pool is exhausted the compute service will go
down. This results in scale and performance tests failing.

2018-06-12 19:58:48.871 30126 WARNING oslo.messaging._drivers.impl_rabbit [req-196321bb-a11a-4e6e-a80a-544ecd093986 c3de6d9ec02c494d978330d8f1a64da1 d37803befc35418981f1f0b6dceec696 - default default] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 104] Connection reset by peer
2018-06-12 19:58:48.872 30126 WARNING oslo.messaging._drivers.impl_rabbit [req-196321bb-a11a-4e6e-a80a-544ecd093986 c3de6d9ec02c494d978330d8f1a64da1 d37803befc35418981f1f0b6dceec696 - default default] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 104] Connection reset by peer
2018-06-12 19:58:54.793 30126 WARNING oslo.messaging._drivers.impl_rabbit [req-196321bb-a11a-4e6e-a80a-544ecd093986 c3de6d9ec02c494d978330d8f1a64da1 d37803befc35418981f1f0b6dceec696 - default default] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 104] Connection reset by peer
2018-06-12 21:37:23.805 30126 DEBUG oslo_concurrency.lockutils [req-196321bb-a11a-4e6e-a80a-544ecd093986 c3de6d9ec02c494d978330d8f1a64da1 d37803befc35418981f1f0b6dceec696 - default default] Lock "compute_resources" released by "nova.compute.resource_tracker._update_available_resource" :: held 6004.943s inner /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:288
2018-06-12 21:37:23.807 30126 ERROR nova.compute.manager [req-196321bb-a11a-4e6e-a80a-544ecd093986 c3de6d9ec02c494d978330d8f1a64da1 d37803befc35418981f1f0b6dceec696 - default default] Error updating resources for node domain-c7.fd3d2358-cc8d-4773-9fef-7a2713ac05ba.: MessagingTimeout: Timed out waiting for a reply to message ID 1eb4b1b40f0f4c66b0266608073717e8

root@controller01:/var/log/nova# vi nova-conductor.log.1
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager [req-77b5e1d7-a4b7-468e-98af-dfdfbf2fad7f 1b5d8da24b39464cb6736d122ccc0665 eb361d7bc9bd40059a2ce2848c985772 - default default] Failed to schedule instances: NoValidHost_Remote: No valid host was found. There are not enough hosts available.
Traceback (most recent call last):

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 226, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 153, in select_destinations
    allocation_request_version, return_alternates)

  File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 93, in select_destinations
    allocation_request_version, return_alternates)

  File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 245, in _schedule
    claimed_instance_uuids)

  File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 282, in _ensure_sufficient_hosts
    raise exception.NoValidHost(reason=reason)

NoValidHost: No valid host was found. There are not enough hosts available.
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager Traceback (most recent call last):
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 1118, in schedule_and_build_instances
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager instance_uuids, return_alternates=True)
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 718, in _schedule_instances
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager return_alternates=return_alternates)
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/nova/scheduler/utils.py", line 727, in wrapped
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager return func(*args, **kwargs)
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 53, in select_destinations
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager instance_uuids, return_objects, return_alternates)
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 37, in __run_method
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager return getattr(self.instance, __name)(*args, **kwargs)
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/query.py", line 42, in select_destinations
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager instance_uuids, return_objects, return_alternates)
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/nova/scheduler/rpcapi.py", line 158, in select_destinations
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager return cctxt.call(ctxt, 'select_destinations', **msg_args)
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 174, in call
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager retry=self.retry)
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 131, in _send
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager timeout=timeout, retry=retry)
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in send
2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager retry=retry)

** Affects: nova
     Importance: Undecided
     Assignee: Gary Kotton (garyk)
         Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1776621

Title:
  Scale: when periodic pool size is small and there is a lot of load the
  compute service goes down

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  When the nova power sync pool is exhausted the compute service will go
  down. This results in scale and performance tests failing.

  2018-06-12 19:58:48.871 30126 WARNING oslo.messaging._drivers.impl_rabbit [req-196321bb-a11a-4e6e-a80a-544ecd093986 c3de6d9ec02c494d978330d8f1a64da1 d37803befc35418981f1f0b6dceec696 - default default] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 104] Connection reset by peer
  2018-06-12 19:58:48.872 30126 WARNING oslo.messaging._drivers.impl_rabbit [req-196321bb-a11a-4e6e-a80a-544ecd093986 c3de6d9ec02c494d978330d8f1a64da1 d37803befc35418981f1f0b6dceec696 - default default] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 104] Connection reset by peer
  2018-06-12 19:58:54.793 30126 WARNING oslo.messaging._drivers.impl_rabbit [req-196321bb-a11a-4e6e-a80a-544ecd093986 c3de6d9ec02c494d978330d8f1a64da1 d37803befc35418981f1f0b6dceec696 - default default] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 104] Connection reset by peer
  2018-06-12 21:37:23.805 30126 DEBUG oslo_concurrency.lockutils [req-196321bb-a11a-4e6e-a80a-544ecd093986 c3de6d9ec02c494d978330d8f1a64da1 d37803befc35418981f1f0b6dceec696 - default default] Lock "compute_resources" released by "nova.compute.resource_tracker._update_available_resource" :: held 6004.943s inner /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:288
  2018-06-12 21:37:23.807 30126 ERROR nova.compute.manager [req-196321bb-a11a-4e6e-a80a-544ecd093986 c3de6d9ec02c494d978330d8f1a64da1 d37803befc35418981f1f0b6dceec696 - default default] Error updating resources for node domain-c7.fd3d2358-cc8d-4773-9fef-7a2713ac05ba.: MessagingTimeout: Timed out waiting for a reply to message ID 1eb4b1b40f0f4c66b0266608073717e8

  root@controller01:/var/log/nova# vi nova-conductor.log.1
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager [req-77b5e1d7-a4b7-468e-98af-dfdfbf2fad7f 1b5d8da24b39464cb6736d122ccc0665 eb361d7bc9bd40059a2ce2848c985772 - default default] Failed to schedule instances: NoValidHost_Remote: No valid host was found. There are not enough hosts available.
  Traceback (most recent call last):

    File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 226, in inner
      return func(*args, **kwargs)

    File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 153, in select_destinations
      allocation_request_version, return_alternates)

    File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 93, in select_destinations
      allocation_request_version, return_alternates)

    File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 245, in _schedule
      claimed_instance_uuids)

    File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 282, in _ensure_sufficient_hosts
      raise exception.NoValidHost(reason=reason)

  NoValidHost: No valid host was found. There are not enough hosts available.
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager Traceback (most recent call last):
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 1118, in schedule_and_build_instances
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager instance_uuids, return_alternates=True)
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 718, in _schedule_instances
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager return_alternates=return_alternates)
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/nova/scheduler/utils.py", line 727, in wrapped
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager return func(*args, **kwargs)
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 53, in select_destinations
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager instance_uuids, return_objects, return_alternates)
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 37, in __run_method
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager return getattr(self.instance, __name)(*args, **kwargs)
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/query.py", line 42, in select_destinations
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager instance_uuids, return_objects, return_alternates)
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/nova/scheduler/rpcapi.py", line 158, in select_destinations
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager return cctxt.call(ctxt, 'select_destinations', **msg_args)
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 174, in call
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager retry=self.retry)
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 131, in _send
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager timeout=timeout, retry=retry)
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in send
  2018-06-12 20:48:10.161 6328 ERROR nova.conductor.manager retry=retry)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1776621/+subscriptions