← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1688228] [NEW] Failure in resize_instance after cast to finish_resize still sets instance error state

 

Public bug reported:

This is from code inspection only.

ComputeManager.resize_instance does:

  with self._error_out_instance_on_exception(context, instance,
                                             quotas=quotas):
      ...stuff...

      self.compute_rpcapi.finish_resize(context, instance,
                    migration, image, disk_info,
                    migration.dest_compute, reservations=quotas.reservations)

      ... Responsibility for the instance has now been punted to the
destination, but...

      self._notify_about_instance_usage(context, instance, "resize.end",
                                              network_info=network_info)

      compute_utils.notify_about_instance_action(context, instance,
                   self.host, action=fields.NotificationAction.RESIZE,
                   phase=fields.NotificationPhase.END)
      self.instance_events.clear_events_for_instance(instance)

The problem is that a failure in anything after the cast to
finish_resize will cause the instance to be put in an error state and
its quotas rolled back. This would not be correct, as any error here
would be purely ephemeral. The resize operation will continue on the
destination regardless, so this would almost certainly result in an
inconsistent state.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1688228

Title:
  Failure in resize_instance after cast to finish_resize still sets
  instance error state

Status in OpenStack Compute (nova):
  New

Bug description:
  This is from code inspection only.

  ComputeManager.resize_instance does:

    with self._error_out_instance_on_exception(context, instance,
                                               quotas=quotas):
        ...stuff...

        self.compute_rpcapi.finish_resize(context, instance,
                      migration, image, disk_info,
                      migration.dest_compute, reservations=quotas.reservations)

        ... Responsibility for the instance has now been punted to the
  destination, but...

        self._notify_about_instance_usage(context, instance, "resize.end",
                                                network_info=network_info)

        compute_utils.notify_about_instance_action(context, instance,
                     self.host, action=fields.NotificationAction.RESIZE,
                     phase=fields.NotificationPhase.END)
        self.instance_events.clear_events_for_instance(instance)

  The problem is that a failure in anything after the cast to
  finish_resize will cause the instance to be put in an error state and
  its quotas rolled back. This would not be correct, as any error here
  would be purely ephemeral. The resize operation will continue on the
  destination regardless, so this would almost certainly result in an
  inconsistent state.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1688228/+subscriptions