yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1388077] [NEW] Parallel periodic instance power state reporting from compute nodes has high impact on conductors and message broker

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: James Page <james.page@xxxxxxxxxx>
Date: Fri, 31 Oct 2014 12:58:29 -0000
Reply-to: Bug 1388077 <1388077@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

Environment: OpenStack Juno release/Ubuntu 14.04/480 compute nodes/8
cloud controllers/40,000 instances +

The change made in:

https://github.com/openstack/nova/commit/baabab45e0ae0e9e35872cae77eb04bdb5ee0545

switches power state reporting from being a serial process for each
instance on a hypervisor to being a parallel thread for every instance;
for clouds running high instance counts, this has quite an impact on the
conductor processes as they try to deal with N instance refresh calls in
parallel where N is the number of instances running on the cloud.

It might be better to throttle this to a configurable parallel level so
that period RPC load can be managed effectively in a larger cloud, or to
continue todo this process in series but outside of the main thread.

The net result of this activity is that it places increase demands on
the message broker, which has to deal with more parallel connections,
and the conductors as they try to consume all of the RPC requests; if
the message broker hits its memory high water mark it will stop
publishers publishing any more messages until the memory usage drops
below the high water mark again - this might not be achievable if all
conductor processes are tied up with existing RPC calls try to send
replies, resulting in a message broker lockup and collapse of all RPC in
the cloud.

** Affects: nova
     Importance: Undecided
         Status: New

** Affects: nova (Ubuntu)
     Importance: Undecided
         Status: New

** Description changed:

  The change made in:
  
  https://github.com/openstack/nova/commit/baabab45e0ae0e9e35872cae77eb04bdb5ee0545
  
  Switches power state reporting from being a serial process on each
  instance on a hypervisor to being a parallel thread for every instance;
  for clouds running high instance densities, this has quite an impact on
  the conductor processes as they try to deal with N instance refresh
  calls in parallel where N is the number of instances running on the
  cloud.
  
  It might be better to throttle this to a configurable parallel level so
  that period RPC load can be managed effectively in a larger cloud, or to
  continue todo this process in series but outside of the main thread.
+ 
+ The net result of this activity is that it places increase demands on
+ the message broker, which has to deal with more parallel connections,
+ and the conductors as they try to consume all of the RPC requests; if
+ the message broker hits its memory high water mark it will stop
+ publishers publishing any more messages until the memory usage drops
+ below the high water mark again - this might not be achievable if all
+ conductor processes are tied up with existing RPC calls try to send
+ replies, resulting in a message broker lockup and collapse of all RPC in
+ the cloud.

** Description changed:

  The change made in:
  
  https://github.com/openstack/nova/commit/baabab45e0ae0e9e35872cae77eb04bdb5ee0545
  
- Switches power state reporting from being a serial process on each
+ Switches power state reporting from being a serial process for each
  instance on a hypervisor to being a parallel thread for every instance;
  for clouds running high instance densities, this has quite an impact on
  the conductor processes as they try to deal with N instance refresh
  calls in parallel where N is the number of instances running on the
  cloud.
  
  It might be better to throttle this to a configurable parallel level so
  that period RPC load can be managed effectively in a larger cloud, or to
  continue todo this process in series but outside of the main thread.
  
  The net result of this activity is that it places increase demands on
  the message broker, which has to deal with more parallel connections,
  and the conductors as they try to consume all of the RPC requests; if
  the message broker hits its memory high water mark it will stop
  publishers publishing any more messages until the memory usage drops
  below the high water mark again - this might not be achievable if all
  conductor processes are tied up with existing RPC calls try to send
  replies, resulting in a message broker lockup and collapse of all RPC in
  the cloud.

** Also affects: nova
   Importance: Undecided
       Status: New

** Summary changed:

- Parallel periodic power state reporting from compute nodes has high impact on conductors
+ Parallel periodic power state reporting from compute nodes has high impact on conductors and message broker

** Description changed:

+ Environment: OpenStack Juno release/Ubuntu 14.04/480 compute nodes/8
+ cloud controllers/40,000 instances +
+ 
  The change made in:
  
  https://github.com/openstack/nova/commit/baabab45e0ae0e9e35872cae77eb04bdb5ee0545
  
- Switches power state reporting from being a serial process for each
+ switches power state reporting from being a serial process for each
  instance on a hypervisor to being a parallel thread for every instance;
  for clouds running high instance densities, this has quite an impact on
  the conductor processes as they try to deal with N instance refresh
  calls in parallel where N is the number of instances running on the
  cloud.
  
  It might be better to throttle this to a configurable parallel level so
  that period RPC load can be managed effectively in a larger cloud, or to
  continue todo this process in series but outside of the main thread.
  
  The net result of this activity is that it places increase demands on
  the message broker, which has to deal with more parallel connections,
  and the conductors as they try to consume all of the RPC requests; if
  the message broker hits its memory high water mark it will stop
  publishers publishing any more messages until the memory usage drops
  below the high water mark again - this might not be achievable if all
  conductor processes are tied up with existing RPC calls try to send
  replies, resulting in a message broker lockup and collapse of all RPC in
  the cloud.

** Summary changed:

- Parallel periodic power state reporting from compute nodes has high impact on conductors and message broker
+ Parallel periodic instance power state reporting from compute nodes has high impact on conductors and message broker

** Description changed:

  Environment: OpenStack Juno release/Ubuntu 14.04/480 compute nodes/8
  cloud controllers/40,000 instances +
  
  The change made in:
  
  https://github.com/openstack/nova/commit/baabab45e0ae0e9e35872cae77eb04bdb5ee0545
  
  switches power state reporting from being a serial process for each
  instance on a hypervisor to being a parallel thread for every instance;
- for clouds running high instance densities, this has quite an impact on
- the conductor processes as they try to deal with N instance refresh
- calls in parallel where N is the number of instances running on the
- cloud.
+ for clouds running high instance counts, this has quite an impact on the
+ conductor processes as they try to deal with N instance refresh calls in
+ parallel where N is the number of instances running on the cloud.
  
  It might be better to throttle this to a configurable parallel level so
  that period RPC load can be managed effectively in a larger cloud, or to
  continue todo this process in series but outside of the main thread.
  
  The net result of this activity is that it places increase demands on
  the message broker, which has to deal with more parallel connections,
  and the conductors as they try to consume all of the RPC requests; if
  the message broker hits its memory high water mark it will stop
  publishers publishing any more messages until the memory usage drops
  below the high water mark again - this might not be achievable if all
  conductor processes are tied up with existing RPC calls try to send
  replies, resulting in a message broker lockup and collapse of all RPC in
  the cloud.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1388077

Title:
  Parallel periodic instance power state reporting from compute nodes
  has high impact on conductors and message broker

Status in OpenStack Compute (Nova):
  New
Status in “nova” package in Ubuntu:
  New

Bug description:
  Environment: OpenStack Juno release/Ubuntu 14.04/480 compute nodes/8
  cloud controllers/40,000 instances +

  The change made in:

  https://github.com/openstack/nova/commit/baabab45e0ae0e9e35872cae77eb04bdb5ee0545

  switches power state reporting from being a serial process for each
  instance on a hypervisor to being a parallel thread for every
  instance; for clouds running high instance counts, this has quite an
  impact on the conductor processes as they try to deal with N instance
  refresh calls in parallel where N is the number of instances running
  on the cloud.

  It might be better to throttle this to a configurable parallel level
  so that period RPC load can be managed effectively in a larger cloud,
  or to continue todo this process in series but outside of the main
  thread.

  The net result of this activity is that it places increase demands on
  the message broker, which has to deal with more parallel connections,
  and the conductors as they try to consume all of the RPC requests; if
  the message broker hits its memory high water mark it will stop
  publishers publishing any more messages until the memory usage drops
  below the high water mark again - this might not be achievable if all
  conductor processes are tied up with existing RPC calls try to send
  replies, resulting in a message broker lockup and collapse of all RPC
  in the cloud.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1388077/+subscriptions

Follow ups

[Bug 1388077] Re: Parallel periodic instance power state reporting from compute nodes has high impact on conductors and message broker
From: Markus Zoeller (markus_z), 2016-07-05
[Bug 1388077] Re: Parallel periodic instance power state reporting from compute nodes has high impact on conductors and message broker
From: James Page, 2015-01-13
[Bug 1388077] Re: Parallel periodic instance power state reporting from compute nodes has high impact on conductors and message broker
From: James Page, 2014-12-03
[Bug 1388077] [NEW] Parallel periodic instance power state reporting from compute nodes has high impact on conductors and message broker
From: James Page, 2014-10-31

References

[Bug 1388077] [NEW] Parallel periodic instance power state reporting from compute nodes has high impact on conductors and message broker
From: James Page, 2014-10-31