yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #85698
[Bug 1922053] [NEW] Operators can unset forced-down with `done` evacuation migration records against the host
Public bug reported:
Description
===========
Another PEBKAC issue but the current evacuation flow allows an admin to
force down, evacuate and unset forced down *without* ever restarting the
compute service. While it is clearly documented that operators need to
fence the source compute service ahead of evacuation (see below) that
should cause a service restart it isn't enforced anywhere in the current
flow:
https://docs.openstack.org/api-ref/compute/?expanded=evacuate-server-
evacuate-action-detail#evacuate-server-evacuate-action
This leaves evacuation migration records marked as done instead of
completed as the source host is never given a chance to clean up. The
request to unset forced down should be rejected until this happens and
the evacuation migration records are marked as completed.
This ultimately could lead to data loss if the instance is migrated back
to the host ahead of the next service restart. That restart causing the
evacuation clean up logic to fire potentially removing storage from
under the running instance.
Steps to reproduce
==================
- Mark a given host as forced down
- Evacuate instances from this host
- Unset forced down on the host
- Check that the migration records associated with the evacuations are still marked as done
Expected result
===============
The request to unset forced down is rejected until the service is
restarted and evacuation migration records moved to completed.
Actual result
=============
The request to unset forced down is allowed and evacuation migration
records remained marked as done. This could eventually lead to data loss
if the instance is migrated back to the host prior to the next service
restart.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/
Master
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
N/A
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
N/A
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Logs & Configs
==============
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1922053
Title:
Operators can unset forced-down with `done` evacuation migration
records against the host
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
Another PEBKAC issue but the current evacuation flow allows an admin
to force down, evacuate and unset forced down *without* ever
restarting the compute service. While it is clearly documented that
operators need to fence the source compute service ahead of evacuation
(see below) that should cause a service restart it isn't enforced
anywhere in the current flow:
https://docs.openstack.org/api-ref/compute/?expanded=evacuate-server-
evacuate-action-detail#evacuate-server-evacuate-action
This leaves evacuation migration records marked as done instead of
completed as the source host is never given a chance to clean up. The
request to unset forced down should be rejected until this happens and
the evacuation migration records are marked as completed.
This ultimately could lead to data loss if the instance is migrated
back to the host ahead of the next service restart. That restart
causing the evacuation clean up logic to fire potentially removing
storage from under the running instance.
Steps to reproduce
==================
- Mark a given host as forced down
- Evacuate instances from this host
- Unset forced down on the host
- Check that the migration records associated with the evacuations are still marked as done
Expected result
===============
The request to unset forced down is rejected until the service is
restarted and evacuation migration records moved to completed.
Actual result
=============
The request to unset forced down is allowed and evacuation migration
records remained marked as done. This could eventually lead to data
loss if the instance is migrated back to the host prior to the next
service restart.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/
Master
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
N/A
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
N/A
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Logs & Configs
==============
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1922053/+subscriptions
Follow ups