yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #55886
[Bug 1619606] [NEW] snapshot_volume_backed races, could result in data corruption
Public bug reported:
snapshot_volume_backed() in compute.API does not set a task_state during
execution. However, in essence it does:
if vm_state == ACTIVE:
quiesce()
snapshot()
if vm_state == ACTIVE:
unquiesce()
There is no exclusion here, though, which means a user could do:
quiesce()
quiesce()
snapshot()
snapshot()
unquiesce() --snapshot() now running after unquiesce -> corruption
unquiesce()
or:
suspend()
snapshot()
NO QUIESCE (we're suspended)
snapshot()
resume()
--snapshot() now running after resume -> corruption
Same goes for stop/start.
Note that snapshot_volume_backed() is a separate top-level entry point
from snapshot(). snapshot() does not suffer from this problem, because
it atomically sets the task state to IMAGE_SNAPSHOT_PENDING when
running, which prevents the user from performing a concurrent operation
on the instance. I suggest that snapshot_volume_backed() should do the
same.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1619606
Title:
snapshot_volume_backed races, could result in data corruption
Status in OpenStack Compute (nova):
New
Bug description:
snapshot_volume_backed() in compute.API does not set a task_state
during execution. However, in essence it does:
if vm_state == ACTIVE:
quiesce()
snapshot()
if vm_state == ACTIVE:
unquiesce()
There is no exclusion here, though, which means a user could do:
quiesce()
quiesce()
snapshot()
snapshot()
unquiesce() --snapshot() now running after unquiesce -> corruption
unquiesce()
or:
suspend()
snapshot()
NO QUIESCE (we're suspended)
snapshot()
resume()
--snapshot() now running after resume -> corruption
Same goes for stop/start.
Note that snapshot_volume_backed() is a separate top-level entry point
from snapshot(). snapshot() does not suffer from this problem, because
it atomically sets the task state to IMAGE_SNAPSHOT_PENDING when
running, which prevents the user from performing a concurrent
operation on the instance. I suggest that snapshot_volume_backed()
should do the same.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1619606/+subscriptions