← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1785270] [NEW] allow confirmation of resize/migration for migrations in "confirming" status

 

Public bug reported:

Confirmation of a resize is an RPC operation.  If a compute node fails
after a migration has been put into the "confirming" status there is no
way to confirm it again, causing the state of the instance to get
"stuck".

In the case of confirm_resize(), I don't see any problem with allowing
us to retry by sending another confirm_resize message. On the target
compute node the actual confirmation is synchronized by instance.uuid,
so there should be no races, and it already handles the "migration is
already confirmed" case.


The proposed code change would look something like this:

     @check_instance_state(vm_state=[vm_states.RESIZED])
     def confirm_resize(self, context, instance, migration=None):
         """Confirms a migration/resize and deletes the 'old' instance."""
         elevated = context.elevated()
         # NOTE(melwitt): We're not checking quota here because there isn't a
         # change in resource usage when confirming a resize. Resource
         # consumption for resizes are written to the database by compute, so
         # a confirm resize is just a clean up of the migration objects and a
         # state change in compute.
         if migration is None:
-            migration = objects.Migration.get_by_instance_and_status(
-                elevated, instance.uuid, 'finished')
+            # Look for migrations in confirming state as well as finished to
+            # handle cases where the confirm did not complete (eg. because
+            # the compute node went away during the confirm).
+            for status in ('finished', 'confirming'):
+                try:
+                    migration = objects.Migration.get_by_instance_and_status(
+                        elevated, instance.uuid, status)
+                    break
+                except exception.MigrationNotFoundByStatus:
+                    pass
+
+            if migration is None:
+                raise exception.MigrationNotFoundByStatus(
+                    instance_id=instance.uuid, status='finished|confirming')

** Affects: nova
     Importance: Low
         Status: Triaged


** Tags: resize

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1785270

Title:
  allow confirmation of resize/migration for migrations in "confirming"
  status

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Confirmation of a resize is an RPC operation.  If a compute node fails
  after a migration has been put into the "confirming" status there is
  no way to confirm it again, causing the state of the instance to get
  "stuck".

  In the case of confirm_resize(), I don't see any problem with allowing
  us to retry by sending another confirm_resize message. On the target
  compute node the actual confirmation is synchronized by instance.uuid,
  so there should be no races, and it already handles the "migration is
  already confirmed" case.

  
  The proposed code change would look something like this:

       @check_instance_state(vm_state=[vm_states.RESIZED])
       def confirm_resize(self, context, instance, migration=None):
           """Confirms a migration/resize and deletes the 'old' instance."""
           elevated = context.elevated()
           # NOTE(melwitt): We're not checking quota here because there isn't a
           # change in resource usage when confirming a resize. Resource
           # consumption for resizes are written to the database by compute, so
           # a confirm resize is just a clean up of the migration objects and a
           # state change in compute.
           if migration is None:
  -            migration = objects.Migration.get_by_instance_and_status(
  -                elevated, instance.uuid, 'finished')
  +            # Look for migrations in confirming state as well as finished to
  +            # handle cases where the confirm did not complete (eg. because
  +            # the compute node went away during the confirm).
  +            for status in ('finished', 'confirming'):
  +                try:
  +                    migration = objects.Migration.get_by_instance_and_status(
  +                        elevated, instance.uuid, status)
  +                    break
  +                except exception.MigrationNotFoundByStatus:
  +                    pass
  +
  +            if migration is None:
  +                raise exception.MigrationNotFoundByStatus(
  +                    instance_id=instance.uuid, status='finished|confirming')

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1785270/+subscriptions