yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1554332] [NEW] neutron agents are too aggressive under server load

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Kevin Benton <1554332@xxxxxxxxxxxxxxxxxx>
Date: Tue, 08 Mar 2016 03:58:17 -0000
Reply-to: Bug 1554332 <1554332@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

If a server operation takes long enough to trigger a timeout on an agent
call to the server, the agent will just give up and issue a new call
immediately. This pattern is pervasive throughout the agents and it
leads to two issues:

First, if the server is busy and the requests take more than the timeout
window to fulfill, the agent will just continually hammer the server
with calls that are bound to fail until the server load is reduced
enough to fulfill the query. If the load is a result of calls from
agents, this leads to a stampeding effect where the server will be
unable to fulfill requests until operator intervention.

Second, the server will build a backlog of call requests that makes the
window of time to process a message smaller as the backlog grows. With
enough clients making calls, the timeout threshold can be crossed before
a call even starts to process. For example, if it takes the server 6
seconds to process a given call and the clients are configured with a 60
second timeout, 30 agents making the call simultaneously will result in
a situation where 20 of the agents will never get a response. The first
10 will get their calls filled and the last 20 will end up in a loop
where the server is just spending time replying to calls that are
expired by the time it processes them.

See the push notification spec for a proposal to eliminate heavy agent
calls: https://review.openstack.org/#/c/225995/

However, even with that spec, we need more intelligent handling of the
cases where calls are required (e.g. initial sync) or where push
notifications are too invasive to change from a call.

** Affects: neutron
     Importance: Undecided
     Assignee: Kevin Benton (kevinbenton)
         Status: New

** Changed in: neutron
     Assignee: (unassigned) => Kevin Benton (kevinbenton)

** Changed in: neutron
    Milestone: None => mitaka-rc1

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1554332

Title:
  neutron agents are too aggressive under server load

Status in neutron:
  New

Bug description:
  If a server operation takes long enough to trigger a timeout on an
  agent call to the server, the agent will just give up and issue a new
  call immediately. This pattern is pervasive throughout the agents and
  it leads to two issues:

  First, if the server is busy and the requests take more than the
  timeout window to fulfill, the agent will just continually hammer the
  server with calls that are bound to fail until the server load is
  reduced enough to fulfill the query. If the load is a result of calls
  from agents, this leads to a stampeding effect where the server will
  be unable to fulfill requests until operator intervention.

  Second, the server will build a backlog of call requests that makes
  the window of time to process a message smaller as the backlog grows.
  With enough clients making calls, the timeout threshold can be crossed
  before a call even starts to process. For example, if it takes the
  server 6 seconds to process a given call and the clients are
  configured with a 60 second timeout, 30 agents making the call
  simultaneously will result in a situation where 20 of the agents will
  never get a response. The first 10 will get their calls filled and the
  last 20 will end up in a loop where the server is just spending time
  replying to calls that are expired by the time it processes them.

  See the push notification spec for a proposal to eliminate heavy agent
  calls: https://review.openstack.org/#/c/225995/

  However, even with that spec, we need more intelligent handling of the
  cases where calls are required (e.g. initial sync) or where push
  notifications are too invasive to change from a call.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1554332/+subscriptions

Follow ups

[Bug 1554332] Re: neutron agents are too aggressive under server load
From: OpenStack Infra, 2016-05-06