← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1461459] Re: Allow disabling the evacuate cleanup mechanism in compute manager

 

I think the DocImpact in the nova change was probably just to get the
config options docs updated with the new workaround option.

If there is anything else we could do with this, it could be to note in
the docs related to evacuate operations that if you're running nova <
liberty, there is a potential data loss issue with the evacuate
functionality if you don't have that patch and don't set the option
appropriately.

For example:

http://docs.openstack.org/user-guide-admin/cli_nova_evacuate.html

http://docs.openstack.org/admin-guide-cloud/compute-node-down.html

There was a spec in liberty to make this smarter, but the existing
problem description applies to nova compute nodes < liberty:

http://specs.openstack.org/openstack/nova-
specs/specs/liberty/implemented/robustify_evacuate.html#problem-
description

If the hostname changes on the compute or you have a typo in your
configs (multiple compute nodes managing the same vcenter running at the
same time), that evacuate code can delete your instances.

That's why the workarounds.destroy_after_evacuate=False option is a way
to safely get around this until you're sure that you're cleaning up a
failed compute node (a real evacuation rather than a misconfiguration or
hostname change), until you get your computes to liberty+.

** Changed in: nova
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1461459

Title:
      Allow disabling the evacuate cleanup mechanism in compute manager

Status in OpenStack Compute (nova):
  Invalid
Status in openstack-manuals:
  Triaged

Bug description:
  https://review.openstack.org/174779
  commit 6f1f9dbc211356a3d0e2d46d3a984d7ceee79ca6
  Author: Tony Breeds <tony@xxxxxxxxxxxxxxxxxx>
  Date:   Tue Jan 27 11:17:54 2015 -0800

      Allow disabling the evacuate cleanup mechanism in compute manager
      
      This mechanism attempts to destroy any locally-running instances on
      startup if instance.host != self.host. The assumption is that the
      instance has been evacuated and is safely running elsewhere. This is
      a dangerous assumption to make, so this patch adds a configuration
      variable to disable this behavior if it's not desired.
      
      Note that disabling it may have implications for the case where
      instances *were* evacuated, given potential shared resources.
      To counter that problem, this patch also makes _init_instance()
      skip initialization of the instance if it appears to be owned
      by another host, logging a prominent warning in that case.
      
      As a result, if you have destroy_after_evacuate=False and you start
      a nova compute with an incorrect hostname, or run it twice from
      another host, then the worst that will happen is you get log
      warnings about the instances on the host being ignored. This should
      be an indication that something is wrong, but still allow for
      fixing it without any loss. If the configuration option is disabled
      and a legitimate evacuation does occur, simply enabling it and then
      restarting the compute service will cause the cleanup to occur.
      
      This is added to the workarounds config group because it is really
      only relevant while evacuate is fundamentally broken in this way.
      It needs to be refactored to be more robust, and once that is done,
      this should be able to go away.
      
      Conflicts:
              nova/compute/manager.py
              nova/tests/unit/compute/test_compute.py
              nova/tests/unit/compute/test_compute_mgr.py
              nova/utils.py
      
      NOTE: In nova/utils.py a new section has been introduced but
      only the option addessed by this backport has been included.
      
      DocImpact: New configuration option, and peril warning
      Partial-Bug: #1419785
      (cherry picked from commit 922148ac45c5a70da8969815b4f47e3c758d6974)
      
      -- squashed with commit --
      
      Create a 'workarounds' config group.
      
      This group is for very specific reasons.
      
      If you're:
      - Working around an issue in a system tool (e.g. libvirt or qemu) where the fix
        is in flight/discussed in that community.
      - The tool can be/is fixed in some distributions and rather than patch the code
        those distributions can trivially set a config option to get the "correct"
        behavior.
      This is a good place for your workaround.
      
      (cherry picked from commit b1689b58409ab97ef64b8cec2ba3773aacca7ac5)
      
      --
      
      Change-Id: Ib9a3c72c096822dd5c65c905117ae14994c73e99

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1461459/+subscriptions