yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #74325
[Bug 1691871] Re: forced-down vs service disable is not documented well in the compute API reference
Reviewed: https://review.openstack.org/492533
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8835198b8d09e9a69ea83741fdb1579a98019b51
Submitter: Zuul
Branch: master
commit 8835198b8d09e9a69ea83741fdb1579a98019b51
Author: Sean Dague <sean@xxxxxxxxx>
Date: Thu Aug 10 09:34:13 2017 -0400
Update api-guide and api-ref to be clear about forced-down
Closes-Bug: #1691871
Related-Bug: #1784826
Change-Id: Ifc6f1549d88a1b7d9f6e25c962c8a15dd8e180fb
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1691871
Title:
forced-down vs service disable is not documented well in the compute
API reference
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Forcing a service, like nova-compute, down is being used by people for
routine planned maintenance/upgrades of their computes, but it's not
really intended for that. Planned maintenance for a nova-compute
service should disable the service so it's taken out of scheduling
decisions, as discussed in the ops guide here:
https://docs.openstack.org/ops-guide/ops-maintenance-compute.html
#planned-maintenance
As described in the spec which added the force-down feature:
https://specs.openstack.org/openstack/nova-
specs/specs/liberty/implemented/mark-host-down.html
It's really about an external monitoring tool detect that a host is
about to fail (maybe hardware faults), and the external service needs
to force the service down (bypass the service group API heartbeat
checks) and perform an evacuation.
The forced-down flag is checked during the evacuate API flow.
Forcing a host down for routine upgrades can be problematic as forced-
down hosts are not part of the minimum service version checks:
https://github.com/openstack/nova/blob/master/nova/objects/service.py#L307
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L490
So if you force a mitaka nova-compute service down, and upgrade the
rest of your computes to newton, when you try to set the mitaka
service to forced_down=False, or simply restart the mitaka nova-
compute service, it's going to fail with a ServiceTooOld exception.
The only way out of that is (1) modify the flag in the database
directly or (2) upgrade the compute to newton (in this example) and
restart it.
We should add information about this to the compute API reference so
that operators have a better understanding of what forced-down vs
service disable means and in what cases you'd use them.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1691871/+subscriptions
References