← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1691871] Re: forced-down vs service disable is not documented well in the compute API reference

 

Reviewed:  https://review.openstack.org/492533
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8835198b8d09e9a69ea83741fdb1579a98019b51
Submitter: Zuul
Branch:    master

commit 8835198b8d09e9a69ea83741fdb1579a98019b51
Author: Sean Dague <sean@xxxxxxxxx>
Date:   Thu Aug 10 09:34:13 2017 -0400

    Update api-guide and api-ref to be clear about forced-down
    
    Closes-Bug: #1691871
    Related-Bug: #1784826
    
    Change-Id: Ifc6f1549d88a1b7d9f6e25c962c8a15dd8e180fb


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1691871

Title:
  forced-down vs service disable is not documented well in the compute
  API reference

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Forcing a service, like nova-compute, down is being used by people for
  routine planned maintenance/upgrades of their computes, but it's not
  really intended for that. Planned maintenance for a nova-compute
  service should disable the service so it's taken out of scheduling
  decisions, as discussed in the ops guide here:

  https://docs.openstack.org/ops-guide/ops-maintenance-compute.html
  #planned-maintenance

  As described in the spec which added the force-down feature:

  https://specs.openstack.org/openstack/nova-
  specs/specs/liberty/implemented/mark-host-down.html

  It's really about an external monitoring tool detect that a host is
  about to fail (maybe hardware faults), and the external service needs
  to force the service down (bypass the service group API heartbeat
  checks) and perform an evacuation.

  The forced-down flag is checked during the evacuate API flow.

  Forcing a host down for routine upgrades can be problematic as forced-
  down hosts are not part of the minimum service version checks:

  https://github.com/openstack/nova/blob/master/nova/objects/service.py#L307
  https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L490

  So if you force a mitaka nova-compute service down, and upgrade the
  rest of your computes to newton, when you try to set the mitaka
  service to forced_down=False, or simply restart the mitaka nova-
  compute service, it's going to fail with a ServiceTooOld exception.
  The only way out of that is (1) modify the flag in the database
  directly or (2) upgrade the compute to newton (in this example) and
  restart it.

  We should add information about this to the compute API reference so
  that operators have a better understanding of what forced-down vs
  service disable means and in what cases you'd use them.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1691871/+subscriptions


References