← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1635762] [NEW] Running instances that are shelved are not cleaned

 

Public bug reported:

Errors or bugs might make instances run on a node, after they were
already deleted or shelved by API. nova-compute now has a mechanism that
allow users to cleanup running instances that were deleted, but not
instances that were shelved or shelved-offloaded.

That can cause serious problems for the user - i.e. instance that if
marked as shelved-offloaded, but actually is running, and is later
started by the user, will cause the same instance to run twice in a
cluster, and 2 instances can write to the same volume and cause data
corruption.

An example use-case that happened to me:
The user has a shelved-offloaded instance. He has requested to unshelve the instance. nova-compute tries to spawn the instance in the hypervisor. The hypervisor is really busy and slow, and reports a timeout. nova-compute tries to clean the instance, but it didn't spawn yet, it will spawn in a few seconds, and the unshelving fails - the instance goes back to being shelved-offloaded.
The user then sees it, and re-tries the unshelving. The scheduler returns a different node, and the spawning is successful. We now have 2 instances of the same VM, on different nodes, writing to the same volume, and a data-corruption occurs.

nova-compute already have a mechanism of cleanup of running-deleted
instances, controlled by the running_deleted_instance_action
configuration parameter. The solution is to make the same mechanism
cleaning-up shelved and shelved-offloaded instances on the same loop.

Version - Mitaka
Hypervisor - libvirt

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1635762

Title:
  Running instances that are shelved are not cleaned

Status in OpenStack Compute (nova):
  New

Bug description:
  Errors or bugs might make instances run on a node, after they were
  already deleted or shelved by API. nova-compute now has a mechanism
  that allow users to cleanup running instances that were deleted, but
  not instances that were shelved or shelved-offloaded.

  That can cause serious problems for the user - i.e. instance that if
  marked as shelved-offloaded, but actually is running, and is later
  started by the user, will cause the same instance to run twice in a
  cluster, and 2 instances can write to the same volume and cause data
  corruption.

  An example use-case that happened to me:
  The user has a shelved-offloaded instance. He has requested to unshelve the instance. nova-compute tries to spawn the instance in the hypervisor. The hypervisor is really busy and slow, and reports a timeout. nova-compute tries to clean the instance, but it didn't spawn yet, it will spawn in a few seconds, and the unshelving fails - the instance goes back to being shelved-offloaded.
  The user then sees it, and re-tries the unshelving. The scheduler returns a different node, and the spawning is successful. We now have 2 instances of the same VM, on different nodes, writing to the same volume, and a data-corruption occurs.

  nova-compute already have a mechanism of cleanup of running-deleted
  instances, controlled by the running_deleted_instance_action
  configuration parameter. The solution is to make the same mechanism
  cleaning-up shelved and shelved-offloaded instances on the same loop.

  Version - Mitaka
  Hypervisor - libvirt

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1635762/+subscriptions


Follow ups