← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1691871] [NEW] forced-down vs service disable is not documented well in the compute API reference

 

Public bug reported:

Forcing a service, like nova-compute, down is being used by people for
routine planned maintenance/upgrades of their computes, but it's not
really intended for that. Planned maintenance for a nova-compute service
should disable the service so it's taken out of scheduling decisions, as
discussed in the ops guide here:

https://docs.openstack.org/ops-guide/ops-maintenance-compute.html
#planned-maintenance

As described in the spec which added the force-down feature:

https://specs.openstack.org/openstack/nova-
specs/specs/liberty/implemented/mark-host-down.html

It's really about an external monitoring tool detect that a host is
about to fail (maybe hardware faults), and the external service needs to
force the service down (bypass the service group API heartbeat checks)
and perform an evacuation.

The forced-down flag is checked during the evacuate API flow.

Forcing a host down for routine upgrades can be problematic as forced-
down hosts are not part of the minimum service version checks:

https://github.com/openstack/nova/blob/master/nova/objects/service.py#L307
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L490

So if you force a mitaka nova-compute service down, and upgrade the rest
of your computes to newton, when you try to set the mitaka service to
forced_down=False, or simply restart the mitaka nova-compute service,
it's going to fail with a ServiceTooOld exception. The only way out of
that is (1) modify the flag in the database directly or (2) upgrade the
compute to newton (in this example) and restart it.

We should add information about this to the compute API reference so
that operators have a better understanding of what forced-down vs
service disable means and in what cases you'd use them.

** Affects: nova
     Importance: Medium
         Status: Confirmed


** Tags: api-ref upgrade

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1691871

Title:
  forced-down vs service disable is not documented well in the compute
  API reference

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  Forcing a service, like nova-compute, down is being used by people for
  routine planned maintenance/upgrades of their computes, but it's not
  really intended for that. Planned maintenance for a nova-compute
  service should disable the service so it's taken out of scheduling
  decisions, as discussed in the ops guide here:

  https://docs.openstack.org/ops-guide/ops-maintenance-compute.html
  #planned-maintenance

  As described in the spec which added the force-down feature:

  https://specs.openstack.org/openstack/nova-
  specs/specs/liberty/implemented/mark-host-down.html

  It's really about an external monitoring tool detect that a host is
  about to fail (maybe hardware faults), and the external service needs
  to force the service down (bypass the service group API heartbeat
  checks) and perform an evacuation.

  The forced-down flag is checked during the evacuate API flow.

  Forcing a host down for routine upgrades can be problematic as forced-
  down hosts are not part of the minimum service version checks:

  https://github.com/openstack/nova/blob/master/nova/objects/service.py#L307
  https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L490

  So if you force a mitaka nova-compute service down, and upgrade the
  rest of your computes to newton, when you try to set the mitaka
  service to forced_down=False, or simply restart the mitaka nova-
  compute service, it's going to fail with a ServiceTooOld exception.
  The only way out of that is (1) modify the flag in the database
  directly or (2) upgrade the compute to newton (in this example) and
  restart it.

  We should add information about this to the compute API reference so
  that operators have a better understanding of what forced-down vs
  service disable means and in what cases you'd use them.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1691871/+subscriptions


Follow ups