yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #31540
[Bug 1368989] Re: service_update() should not set an RPC timeout longer than service.report_interval
** Changed in: nova/juno
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1368989
Title:
service_update() should not set an RPC timeout longer than
service.report_interval
Status in OpenStack Compute (Nova):
Fix Released
Status in OpenStack Compute (nova) juno series:
Fix Released
Bug description:
nova.servicegroup.drivers.db.DbDriver._report_state() is called every
service.report_interval seconds from a timer in order to periodically
report the service state. It calls
self.conductor_api.service_update().
If this ends up calling
nova.conductor.rpcapi.ConductorAPI.service_update(), it will do an RPC
call() to nova-conductor.
If anything happens to the RPC server (failover, switchover, etc.) by
default the RPC code will wait 60 seconds for a response (blocking the
timer-based calling of _report_state() in the meantime). This is long
enough to cause the status in the database to get old enough that
other services consider this service to be "down".
Arguably, since we're going to call service_update( ) again in
service.report_interval seconds there's no reason to wait the full 60
seconds. Instead, it would make sense to set the RPC timeout for the
service_update() call to to something slightly less than
service.report_interval seconds.
I've also submitted a related bug report
(https://bugs.launchpad.net/bugs/1368917) to improve RPC loss of
connection in general, but I expect that'll take a while to deal with
while this particular case can be handled much more easily.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1368989/+subscriptions
References