← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1820802] [NEW] nova orphan instances

 

Public bug reported:

Description
===========
Under some corner conditions, Instances might become orphan: Nova does not aware that instance is running on the host anymore.

Steps to reproduce
==================

1) Suppose nova-compute get down for some reason, and during this
downtime period, the user deletes the server by API, then it's records
deleted from the DB. After this, nova-compute comes back up again. Now
the guest VM is still running on this compute node and consuming
resources.

2) During Live-Migration, after the Live-Migration begins, it then runs
to completion controlled by libvirt. If something happened to the under-
layer infrastructure, eg, rabbitmq dead or networking is terrible
congestion, it may not delete the instance on source compute, or it try
to rollback but failed, then, there will be 2 of the same instance on
both source and destination compute node. On the source host, the
instance is a duplication, it's orphan instance for source compute node.

Expected result
===============
There should be no orphan instances.

Actual result
=============
Some instances is out of management of Nova.

Environment
===========
Reproduce such condition is not easy. Refer to discuss on stein meetup:
https://etherpad.openstack.org/p/nova-ptg-stein L931


Fix
=====

Proposal to add a periodic task which provides what action would be taken if find an orphan instance, suggest action is:
* reap the instance.
* stop the instance.
* log the messages only. [default]

The interval of the periodic task should be configurable.

This was proposed as a Blueprints previously but more qualified as a
bug. Refer to:

https://blueprints.launchpad.net/nova/+spec/periodic-orphan-instances-
delete

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1820802

Title:
  nova orphan instances

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  Under some corner conditions, Instances might become orphan: Nova does not aware that instance is running on the host anymore.

  Steps to reproduce
  ==================

  1) Suppose nova-compute get down for some reason, and during this
  downtime period, the user deletes the server by API, then it's records
  deleted from the DB. After this, nova-compute comes back up again. Now
  the guest VM is still running on this compute node and consuming
  resources.

  2) During Live-Migration, after the Live-Migration begins, it then
  runs to completion controlled by libvirt. If something happened to the
  under-layer infrastructure, eg, rabbitmq dead or networking is
  terrible congestion, it may not delete the instance on source compute,
  or it try to rollback but failed, then, there will be 2 of the same
  instance on both source and destination compute node. On the source
  host, the instance is a duplication, it's orphan instance for source
  compute node.

  Expected result
  ===============
  There should be no orphan instances.

  Actual result
  =============
  Some instances is out of management of Nova.

  Environment
  ===========
  Reproduce such condition is not easy. Refer to discuss on stein meetup:
  https://etherpad.openstack.org/p/nova-ptg-stein L931

  
  Fix
  =====

  Proposal to add a periodic task which provides what action would be taken if find an orphan instance, suggest action is:
  * reap the instance.
  * stop the instance.
  * log the messages only. [default]

  The interval of the periodic task should be configurable.

  This was proposed as a Blueprints previously but more qualified as a
  bug. Refer to:

  https://blueprints.launchpad.net/nova/+spec/periodic-orphan-instances-
  delete

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1820802/+subscriptions