← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1968555] Re: evacuate after network issue will cause vm running on two host

 

The evacuate API states:

Preconditions

    The failed host must be fenced and no longer running the original
server.

    The failed host must be reported as down or marked as forced down
using Update Forced Down.

So when you detect the control network failure you have to make sure
that the host is fenced before you evacuate the instance. This is
exactly there to prevent the duplication of the VM via evacuation.

The most common fencing method is power fencing. I.e. when the issue is
detected the problematic compute is powered off via out of band
management. Then VMs can be safely evacuated.

[1] https://docs.openstack.org/api-ref/compute/?expanded=evacuate-
server-evacuate-action-detail#evacuate-server-evacuate-action

** Changed in: nova
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1968555

Title:
  evacuate after network issue will cause vm running on two host

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Environment
  ===========
  openstack queen + libvirt 4.5.0 + qemu 2.12 running on centos7, with ceph rbd storage

  Description
  ===========
  If the management network of the compute host is abnormal, it may cause nova-compute down but the openstack-nova-compute.service is still running on that host. Now you evacuate a vm on that host, the evacuate will succeed, the vm will be running both on the old host and the new host even after the management network of old host recover, it may cause vm error.   

  Steps to reproduce
  ==================
  1. Manually turn down the management network port of the compute host, like ifconfig eth0 down
  2. After the nova-compute of that host see down with openstack compute service list, evacuate one vm on that host:
  nova evacuate <vm's uuid>
  3. After evacuate succeed, you can find the vm running on two host.
  4. Manually turn up the management network port of the old compute host, like ifconfig eth0 up, you can find the vm still running on this host, it can't be auto destroy unless you restart the openstack-nova-compute.service on that host.

  Expected result
  ===============
  Maybe we can add a periodic task to auto destroy this vm?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1968555/+subscriptions



References