yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #58071
[Bug 1635762] [NEW] Running instances that are shelved are not cleaned
Public bug reported:
Errors or bugs might make instances run on a node, after they were
already deleted or shelved by API. nova-compute now has a mechanism that
allow users to cleanup running instances that were deleted, but not
instances that were shelved or shelved-offloaded.
That can cause serious problems for the user - i.e. instance that if
marked as shelved-offloaded, but actually is running, and is later
started by the user, will cause the same instance to run twice in a
cluster, and 2 instances can write to the same volume and cause data
corruption.
An example use-case that happened to me:
The user has a shelved-offloaded instance. He has requested to unshelve the instance. nova-compute tries to spawn the instance in the hypervisor. The hypervisor is really busy and slow, and reports a timeout. nova-compute tries to clean the instance, but it didn't spawn yet, it will spawn in a few seconds, and the unshelving fails - the instance goes back to being shelved-offloaded.
The user then sees it, and re-tries the unshelving. The scheduler returns a different node, and the spawning is successful. We now have 2 instances of the same VM, on different nodes, writing to the same volume, and a data-corruption occurs.
nova-compute already have a mechanism of cleanup of running-deleted
instances, controlled by the running_deleted_instance_action
configuration parameter. The solution is to make the same mechanism
cleaning-up shelved and shelved-offloaded instances on the same loop.
Version - Mitaka
Hypervisor - libvirt
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1635762
Title:
Running instances that are shelved are not cleaned
Status in OpenStack Compute (nova):
New
Bug description:
Errors or bugs might make instances run on a node, after they were
already deleted or shelved by API. nova-compute now has a mechanism
that allow users to cleanup running instances that were deleted, but
not instances that were shelved or shelved-offloaded.
That can cause serious problems for the user - i.e. instance that if
marked as shelved-offloaded, but actually is running, and is later
started by the user, will cause the same instance to run twice in a
cluster, and 2 instances can write to the same volume and cause data
corruption.
An example use-case that happened to me:
The user has a shelved-offloaded instance. He has requested to unshelve the instance. nova-compute tries to spawn the instance in the hypervisor. The hypervisor is really busy and slow, and reports a timeout. nova-compute tries to clean the instance, but it didn't spawn yet, it will spawn in a few seconds, and the unshelving fails - the instance goes back to being shelved-offloaded.
The user then sees it, and re-tries the unshelving. The scheduler returns a different node, and the spawning is successful. We now have 2 instances of the same VM, on different nodes, writing to the same volume, and a data-corruption occurs.
nova-compute already have a mechanism of cleanup of running-deleted
instances, controlled by the running_deleted_instance_action
configuration parameter. The solution is to make the same mechanism
cleaning-up shelved and shelved-offloaded instances on the same loop.
Version - Mitaka
Hypervisor - libvirt
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1635762/+subscriptions
Follow ups