yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #12153
[Bug 1297375] [NEW] All nova apis relying on Instance.save(expected_*_state) for safety contain a race condition
Public bug reported:
Take, for example, resize_instance(). In manager.py, we assert that the
instance is in RESIZE_PREP state with:
instance.save(expected_task_state=task_states.RESIZE_PREP)
This should mean that the first resize will succeed, and any subsequent
will fail. However, the underlying db implementation does not lock the
instance during the update, and therefore doesn't guarantee this.
Specifically, _instance_update() in db/sqlalchemy/apy.py starts a
session, and reads task_state from the instance. However, it does not
use a 'select ... for update', meaning the row is not locked. 2
concurrent calls to this method can both read the same state, then race
to the update. The last writer will win. Without 'select ... for
update', the db transaction is only ensuring that all writes are atomic,
not reads with dependent writes.
SQLAlchemy seems to support select ... for update, as do MySQL and
PostgreSQL, although MySQL will fall back to whole table locks for non-
InnoDB tables, which would likely be a significant performance hit.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1297375
Title:
All nova apis relying on Instance.save(expected_*_state) for safety
contain a race condition
Status in OpenStack Compute (Nova):
New
Bug description:
Take, for example, resize_instance(). In manager.py, we assert that
the instance is in RESIZE_PREP state with:
instance.save(expected_task_state=task_states.RESIZE_PREP)
This should mean that the first resize will succeed, and any
subsequent will fail. However, the underlying db implementation does
not lock the instance during the update, and therefore doesn't
guarantee this.
Specifically, _instance_update() in db/sqlalchemy/apy.py starts a
session, and reads task_state from the instance. However, it does not
use a 'select ... for update', meaning the row is not locked. 2
concurrent calls to this method can both read the same state, then
race to the update. The last writer will win. Without 'select ... for
update', the db transaction is only ensuring that all writes are
atomic, not reads with dependent writes.
SQLAlchemy seems to support select ... for update, as do MySQL and
PostgreSQL, although MySQL will fall back to whole table locks for
non-InnoDB tables, which would likely be a significant performance
hit.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1297375/+subscriptions
Follow ups
References