yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #64113
[Bug 1691871] [NEW] forced-down vs service disable is not documented well in the compute API reference
Public bug reported:
Forcing a service, like nova-compute, down is being used by people for
routine planned maintenance/upgrades of their computes, but it's not
really intended for that. Planned maintenance for a nova-compute service
should disable the service so it's taken out of scheduling decisions, as
discussed in the ops guide here:
https://docs.openstack.org/ops-guide/ops-maintenance-compute.html
#planned-maintenance
As described in the spec which added the force-down feature:
https://specs.openstack.org/openstack/nova-
specs/specs/liberty/implemented/mark-host-down.html
It's really about an external monitoring tool detect that a host is
about to fail (maybe hardware faults), and the external service needs to
force the service down (bypass the service group API heartbeat checks)
and perform an evacuation.
The forced-down flag is checked during the evacuate API flow.
Forcing a host down for routine upgrades can be problematic as forced-
down hosts are not part of the minimum service version checks:
https://github.com/openstack/nova/blob/master/nova/objects/service.py#L307
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L490
So if you force a mitaka nova-compute service down, and upgrade the rest
of your computes to newton, when you try to set the mitaka service to
forced_down=False, or simply restart the mitaka nova-compute service,
it's going to fail with a ServiceTooOld exception. The only way out of
that is (1) modify the flag in the database directly or (2) upgrade the
compute to newton (in this example) and restart it.
We should add information about this to the compute API reference so
that operators have a better understanding of what forced-down vs
service disable means and in what cases you'd use them.
** Affects: nova
Importance: Medium
Status: Confirmed
** Tags: api-ref upgrade
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1691871
Title:
forced-down vs service disable is not documented well in the compute
API reference
Status in OpenStack Compute (nova):
Confirmed
Bug description:
Forcing a service, like nova-compute, down is being used by people for
routine planned maintenance/upgrades of their computes, but it's not
really intended for that. Planned maintenance for a nova-compute
service should disable the service so it's taken out of scheduling
decisions, as discussed in the ops guide here:
https://docs.openstack.org/ops-guide/ops-maintenance-compute.html
#planned-maintenance
As described in the spec which added the force-down feature:
https://specs.openstack.org/openstack/nova-
specs/specs/liberty/implemented/mark-host-down.html
It's really about an external monitoring tool detect that a host is
about to fail (maybe hardware faults), and the external service needs
to force the service down (bypass the service group API heartbeat
checks) and perform an evacuation.
The forced-down flag is checked during the evacuate API flow.
Forcing a host down for routine upgrades can be problematic as forced-
down hosts are not part of the minimum service version checks:
https://github.com/openstack/nova/blob/master/nova/objects/service.py#L307
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L490
So if you force a mitaka nova-compute service down, and upgrade the
rest of your computes to newton, when you try to set the mitaka
service to forced_down=False, or simply restart the mitaka nova-
compute service, it's going to fail with a ServiceTooOld exception.
The only way out of that is (1) modify the flag in the database
directly or (2) upgrade the compute to newton (in this example) and
restart it.
We should add information about this to the compute API reference so
that operators have a better understanding of what forced-down vs
service disable means and in what cases you'd use them.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1691871/+subscriptions
Follow ups