← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1848666] [NEW] Race can cause instance to become ACTIVE after build error

 

Public bug reported:

2 functions used in error cleanup in _do_build_and_run_instance:
_cleanup_allocated_networks and _set_instance_obj_error_state, call an
unguarded instance.save(). The problem with this is that the instance
object may have been in an unclean state before the build exception was
raised. Calling instance.save() will persist this unclean error state in
addition to whatever change was made during cleanup, which is not
intended.

Specifically in the case that a build races with a delete, the build can
fail when we try to do an atomic save to set the vm_state to active,
raising UnexpectedDeletingTaskStateError. However, the instance object
still contains the unpersisted vm_state change along with other
concomitant changes. These will all be persisted when
_cleanup_allocated_networks calls instance.save(). This means that the
instance.save(expected_task_state=SPAWNING) which correctly failed due
to a race, later succeeds accidentally in cleanup resulting in an
inconsistent instance state.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1848666

Title:
  Race can cause instance to become ACTIVE after build error

Status in OpenStack Compute (nova):
  New

Bug description:
  2 functions used in error cleanup in _do_build_and_run_instance:
  _cleanup_allocated_networks and _set_instance_obj_error_state, call an
  unguarded instance.save(). The problem with this is that the instance
  object may have been in an unclean state before the build exception
  was raised. Calling instance.save() will persist this unclean error
  state in addition to whatever change was made during cleanup, which is
  not intended.

  Specifically in the case that a build races with a delete, the build
  can fail when we try to do an atomic save to set the vm_state to
  active, raising UnexpectedDeletingTaskStateError. However, the
  instance object still contains the unpersisted vm_state change along
  with other concomitant changes. These will all be persisted when
  _cleanup_allocated_networks calls instance.save(). This means that the
  instance.save(expected_task_state=SPAWNING) which correctly failed due
  to a race, later succeeds accidentally in cleanup resulting in an
  inconsistent instance state.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1848666/+subscriptions