yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1368989] Re: service_update() should not set an RPC timeout longer than service.report_interval

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Adam Gandelman <1368989@xxxxxxxxxxxxxxxxxx>
Date: Fri, 10 Apr 2015 06:45:16 -0000
Reply-to: Bug 1368989 <1368989@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

** Changed in: nova/juno
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1368989

Title:
  service_update() should not set an RPC timeout longer than
  service.report_interval

Status in OpenStack Compute (Nova):
  Fix Released
Status in OpenStack Compute (nova) juno series:
  Fix Released

Bug description:
  nova.servicegroup.drivers.db.DbDriver._report_state() is called every
  service.report_interval seconds from a timer in order to periodically
  report the service state.  It calls
  self.conductor_api.service_update().

  If this ends up calling
  nova.conductor.rpcapi.ConductorAPI.service_update(), it will do an RPC
  call() to nova-conductor.

  If anything happens to the RPC server (failover, switchover, etc.) by
  default the RPC code will wait 60 seconds for a response (blocking the
  timer-based calling of _report_state() in the meantime).  This is long
  enough to cause the status in the database to get old enough that
  other services consider this service to be "down".

  Arguably, since we're going to call service_update( ) again in
  service.report_interval seconds there's no reason to wait the full 60
  seconds.  Instead, it would make sense to set the RPC timeout for the
  service_update() call to to something slightly less than
  service.report_interval seconds.

  I've also submitted a related bug report
  (https://bugs.launchpad.net/bugs/1368917) to improve RPC loss of
  connection in general, but I expect that'll take a while to deal with
  while this particular case can be handled much more easily.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1368989/+subscriptions

References

[Bug 1368989] [NEW] service_update() should not set an RPC timeout longer than service.report_interval
From: Chris Friesen, 2014-09-12