← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1297375] [NEW] All nova apis relying on Instance.save(expected_*_state) for safety contain a race condition

 

Public bug reported:

Take, for example, resize_instance(). In manager.py, we assert that the
instance is in RESIZE_PREP state with:

  instance.save(expected_task_state=task_states.RESIZE_PREP)

This should mean that the first resize will succeed, and any subsequent
will fail. However, the underlying db implementation does not lock the
instance during the update, and therefore doesn't guarantee this.

Specifically, _instance_update() in db/sqlalchemy/apy.py starts a
session, and reads task_state from the instance. However, it does not
use a 'select ... for update', meaning the row is not locked. 2
concurrent calls to this method can both read the same state, then race
to the update. The last writer will win. Without 'select ... for
update', the db transaction is only ensuring that all writes are atomic,
not reads with dependent writes.

SQLAlchemy seems to support select ... for update, as do MySQL and
PostgreSQL, although MySQL will fall back to whole table locks for non-
InnoDB tables, which would likely be a significant performance hit.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1297375

Title:
  All nova apis relying on Instance.save(expected_*_state) for safety
  contain a race condition

Status in OpenStack Compute (Nova):
  New

Bug description:
  Take, for example, resize_instance(). In manager.py, we assert that
  the instance is in RESIZE_PREP state with:

    instance.save(expected_task_state=task_states.RESIZE_PREP)

  This should mean that the first resize will succeed, and any
  subsequent will fail. However, the underlying db implementation does
  not lock the instance during the update, and therefore doesn't
  guarantee this.

  Specifically, _instance_update() in db/sqlalchemy/apy.py starts a
  session, and reads task_state from the instance. However, it does not
  use a 'select ... for update', meaning the row is not locked. 2
  concurrent calls to this method can both read the same state, then
  race to the update. The last writer will win. Without 'select ... for
  update', the db transaction is only ensuring that all writes are
  atomic, not reads with dependent writes.

  SQLAlchemy seems to support select ... for update, as do MySQL and
  PostgreSQL, although MySQL will fall back to whole table locks for
  non-InnoDB tables, which would likely be a significant performance
  hit.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1297375/+subscriptions


Follow ups

References