yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79109
[Bug 1821373] Re: Most instance actions can be called concurrently
** Also affects: nova/rocky
Importance: Undecided
Status: New
** Also affects: nova/queens
Importance: Undecided
Status: New
** Also affects: nova/stein
Importance: Undecided
Status: New
** Changed in: nova/queens
Status: New => In Progress
** Changed in: nova/stein
Status: New => Fix Released
** Changed in: nova/queens
Importance: Undecided => Low
** Changed in: nova/rocky
Importance: Undecided => Low
** Changed in: nova/queens
Assignee: (unassigned) => Matthew Booth (mbooth-9)
** Changed in: nova/stein
Importance: Undecided => Low
** Changed in: nova/stein
Assignee: (unassigned) => Matthew Booth (mbooth-9)
** Changed in: nova/rocky
Assignee: (unassigned) => Matthew Booth (mbooth-9)
** Changed in: nova/rocky
Status: New => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1821373
Title:
Most instance actions can be called concurrently
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) queens series:
In Progress
Status in OpenStack Compute (nova) rocky series:
Fix Released
Status in OpenStack Compute (nova) stein series:
Fix Released
Bug description:
A customer reported that they were getting DB corruption if they
called shelve twice in quick succession on the same instance. This
should be prevented by the guard in nova.API.shelve, which does:
instance.task_state = task_states.SHELVING
instance.save(expected_task_state=[None])
This is intended to act as a robust gate against 2 instance actions
happening concurrently. The first will set the task state to SHELVING,
the second will fail because the task state is not SHELVING. The
comparison is done atomically in
db.instance_update_and_get_original(), and should be race free.
However, instance.save() shortcuts if there is no update and does not
call db.instance_update_and_get_original(). Therefore this guard fails
if we call the same operation twice:
instance = get_instance()
=> Returned instance.task_state is None
instance.task_state = task_states.SHELVING
instance.save(expected_task_state=[None])
=> task_state was None, now SHELVING, updates = {'task_state': SHELVING}
=> db.instance_update_and_get_original() executes and succeeds
instance = get_instance()
=> Returned instance.task_state is SHELVING
instance.task_state = task_states.SHELVING
instance.save(expected_task_state=[None])
=> task_state was SHELVING, still SHELVING, updates = {}
=> db.instance_update_and_get_original() does not execute, therefore doesn't raise the expected exception
This pattern is common to almost all instance actions in nova api. A
quick scan suggests that all of the following actions are affected by
this bug, and can therefore all potentially be executed multiple times
concurrently for the same instance:
restore
force_stop
start
backup
snapshot
soft reboot
hard reboot
rebuild
revert_resize
resize
shelve
shelve_offload
unshelve
pause
unpause
suspend
resume
rescue
unrescue
set_admin_password
live_migrate
evacuate
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1821373/+subscriptions
References