← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1619606] [NEW] snapshot_volume_backed races, could result in data corruption

 

Public bug reported:

snapshot_volume_backed() in compute.API does not set a task_state during
execution. However, in essence it does:

if vm_state == ACTIVE:
  quiesce()
snapshot()
if vm_state == ACTIVE:
  unquiesce()

There is no exclusion here, though, which means a user could do:

quiesce()
                   quiesce()
snapshot()
                   snapshot()

unquiesce()        --snapshot() now running after unquiesce -> corruption
                   unquiesce()

or:

suspend()
snapshot()
  NO QUIESCE (we're suspended)
  snapshot()
                   resume()
  --snapshot() now running after resume -> corruption

Same goes for stop/start.

Note that snapshot_volume_backed() is a separate top-level entry point
from snapshot(). snapshot() does not suffer from this problem, because
it atomically sets the task state to IMAGE_SNAPSHOT_PENDING when
running, which prevents the user from performing a concurrent operation
on the instance. I suggest that snapshot_volume_backed() should do the
same.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1619606

Title:
  snapshot_volume_backed races, could result in data corruption

Status in OpenStack Compute (nova):
  New

Bug description:
  snapshot_volume_backed() in compute.API does not set a task_state
  during execution. However, in essence it does:

  if vm_state == ACTIVE:
    quiesce()
  snapshot()
  if vm_state == ACTIVE:
    unquiesce()

  There is no exclusion here, though, which means a user could do:

  quiesce()
                     quiesce()
  snapshot()
                     snapshot()

  unquiesce()        --snapshot() now running after unquiesce -> corruption
                     unquiesce()

  or:

  suspend()
  snapshot()
    NO QUIESCE (we're suspended)
    snapshot()
                     resume()
    --snapshot() now running after resume -> corruption

  Same goes for stop/start.

  Note that snapshot_volume_backed() is a separate top-level entry point
  from snapshot(). snapshot() does not suffer from this problem, because
  it atomically sets the task state to IMAGE_SNAPSHOT_PENDING when
  running, which prevents the user from performing a concurrent
  operation on the instance. I suggest that snapshot_volume_backed()
  should do the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1619606/+subscriptions