yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #77573
[Bug 1820802] [NEW] nova orphan instances
Public bug reported:
Description
===========
Under some corner conditions, Instances might become orphan: Nova does not aware that instance is running on the host anymore.
Steps to reproduce
==================
1) Suppose nova-compute get down for some reason, and during this
downtime period, the user deletes the server by API, then it's records
deleted from the DB. After this, nova-compute comes back up again. Now
the guest VM is still running on this compute node and consuming
resources.
2) During Live-Migration, after the Live-Migration begins, it then runs
to completion controlled by libvirt. If something happened to the under-
layer infrastructure, eg, rabbitmq dead or networking is terrible
congestion, it may not delete the instance on source compute, or it try
to rollback but failed, then, there will be 2 of the same instance on
both source and destination compute node. On the source host, the
instance is a duplication, it's orphan instance for source compute node.
Expected result
===============
There should be no orphan instances.
Actual result
=============
Some instances is out of management of Nova.
Environment
===========
Reproduce such condition is not easy. Refer to discuss on stein meetup:
https://etherpad.openstack.org/p/nova-ptg-stein L931
Fix
=====
Proposal to add a periodic task which provides what action would be taken if find an orphan instance, suggest action is:
* reap the instance.
* stop the instance.
* log the messages only. [default]
The interval of the periodic task should be configurable.
This was proposed as a Blueprints previously but more qualified as a
bug. Refer to:
https://blueprints.launchpad.net/nova/+spec/periodic-orphan-instances-
delete
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1820802
Title:
nova orphan instances
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
Under some corner conditions, Instances might become orphan: Nova does not aware that instance is running on the host anymore.
Steps to reproduce
==================
1) Suppose nova-compute get down for some reason, and during this
downtime period, the user deletes the server by API, then it's records
deleted from the DB. After this, nova-compute comes back up again. Now
the guest VM is still running on this compute node and consuming
resources.
2) During Live-Migration, after the Live-Migration begins, it then
runs to completion controlled by libvirt. If something happened to the
under-layer infrastructure, eg, rabbitmq dead or networking is
terrible congestion, it may not delete the instance on source compute,
or it try to rollback but failed, then, there will be 2 of the same
instance on both source and destination compute node. On the source
host, the instance is a duplication, it's orphan instance for source
compute node.
Expected result
===============
There should be no orphan instances.
Actual result
=============
Some instances is out of management of Nova.
Environment
===========
Reproduce such condition is not easy. Refer to discuss on stein meetup:
https://etherpad.openstack.org/p/nova-ptg-stein L931
Fix
=====
Proposal to add a periodic task which provides what action would be taken if find an orphan instance, suggest action is:
* reap the instance.
* stop the instance.
* log the messages only. [default]
The interval of the periodic task should be configurable.
This was proposed as a Blueprints previously but more qualified as a
bug. Refer to:
https://blueprints.launchpad.net/nova/+spec/periodic-orphan-instances-
delete
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1820802/+subscriptions