yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #36532
[Bug 1297375] Re: All nova apis relying on Instance.save(expected_*_state) for safety contain a race condition
This was reverted, but now has a patch open for review again
** Changed in: nova
Milestone: liberty-2 => None
** Changed in: nova
Status: Fix Released => In Progress
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1297375
Title:
All nova apis relying on Instance.save(expected_*_state) for safety
contain a race condition
Status in OpenStack Compute (nova):
In Progress
Bug description:
Take, for example, resize_instance(). In manager.py, we assert that
the instance is in RESIZE_PREP state with:
instance.save(expected_task_state=task_states.RESIZE_PREP)
This should mean that the first resize will succeed, and any
subsequent will fail. However, the underlying db implementation does
not lock the instance during the update, and therefore doesn't
guarantee this.
Specifically, _instance_update() in db/sqlalchemy/apy.py starts a
session, and reads task_state from the instance. However, it does not
use a 'select ... for update', meaning the row is not locked. 2
concurrent calls to this method can both read the same state, then
race to the update. The last writer will win. Without 'select ... for
update', the db transaction is only ensuring that all writes are
atomic, not reads with dependent writes.
SQLAlchemy seems to support select ... for update, as do MySQL and
PostgreSQL, although MySQL will fall back to whole table locks for
non-InnoDB tables, which would likely be a significant performance
hit.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1297375/+subscriptions
References