← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1821373] [NEW] Most instance actions can be called concurrently

 

Public bug reported:

A customer reported that they were getting DB corruption if they called
shelve twice in quick succession on the same instance. This should be
prevented by the guard in nova.API.shelve, which does:

  instance.task_state = task_states.SHELVING
  instance.save(expected_task_state=[None])

This is intended to act as a robust gate against 2 instance actions
happening concurrently. The first will set the task state to SHELVING,
the second will fail because the task state is not SHELVING. The
comparison is done atomically in db.instance_update_and_get_original(),
and should be race free.

However, instance.save() shortcuts if there is no update and does not
call db.instance_update_and_get_original(). Therefore this guard fails
if we call the same operation twice:

  instance = get_instance()
    => Returned instance.task_state is None
  instance.task_state = task_states.SHELVING
  instance.save(expected_task_state=[None])
    => task_state was None, now SHELVING, updates = {'task_state': SHELVING}
    => db.instance_update_and_get_original() executes and succeeds

  instance = get_instance()
    => Returned instance.task_state is SHELVING
  instance.task_state = task_states.SHELVING
  instance.save(expected_task_state=[None])
    => task_state was SHELVING, still SHELVING, updates = {}
    => db.instance_update_and_get_original() does not execute, therefore doesn't raise the expected exception

This pattern is common to almost all instance actions in nova api. A
quick scan suggests that all of the following actions are affected by
this bug, and can therefore all potentially be executed multiple times
concurrently for the same instance:

restore
force_stop
start
backup
snapshot
soft reboot
hard reboot
rebuild
revert_resize
resize
shelve
shelve_offload
unshelve
pause
unpause
suspend
resume
rescue
unrescue
set_admin_password
live_migrate
evacuate

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1821373

Title:
  Most instance actions can be called concurrently

Status in OpenStack Compute (nova):
  New

Bug description:
  A customer reported that they were getting DB corruption if they
  called shelve twice in quick succession on the same instance. This
  should be prevented by the guard in nova.API.shelve, which does:

    instance.task_state = task_states.SHELVING
    instance.save(expected_task_state=[None])

  This is intended to act as a robust gate against 2 instance actions
  happening concurrently. The first will set the task state to SHELVING,
  the second will fail because the task state is not SHELVING. The
  comparison is done atomically in
  db.instance_update_and_get_original(), and should be race free.

  However, instance.save() shortcuts if there is no update and does not
  call db.instance_update_and_get_original(). Therefore this guard fails
  if we call the same operation twice:

    instance = get_instance()
      => Returned instance.task_state is None
    instance.task_state = task_states.SHELVING
    instance.save(expected_task_state=[None])
      => task_state was None, now SHELVING, updates = {'task_state': SHELVING}
      => db.instance_update_and_get_original() executes and succeeds

    instance = get_instance()
      => Returned instance.task_state is SHELVING
    instance.task_state = task_states.SHELVING
    instance.save(expected_task_state=[None])
      => task_state was SHELVING, still SHELVING, updates = {}
      => db.instance_update_and_get_original() does not execute, therefore doesn't raise the expected exception

  This pattern is common to almost all instance actions in nova api. A
  quick scan suggests that all of the following actions are affected by
  this bug, and can therefore all potentially be executed multiple times
  concurrently for the same instance:

  restore
  force_stop
  start
  backup
  snapshot
  soft reboot
  hard reboot
  rebuild
  revert_resize
  resize
  shelve
  shelve_offload
  unshelve
  pause
  unpause
  suspend
  resume
  rescue
  unrescue
  set_admin_password
  live_migrate
  evacuate

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1821373/+subscriptions


Follow ups