← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1635160] [NEW] No bootable device when evacuate a instance on shared_storage_storage ceph

 

Public bug reported:

Nova Verion:nova-kilo-2015.1.1
Ceph Verion:ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

When i test nova evacuate function i found after the instance evacuated it cant not boot normally.
By the vnc console i see "No bootable device" info.

Through some tests i found the reason: when u used the shared storage
the rebuild task flow will not get the image meta again. So if u set
meta for the image, the problem occur.

The code:
 nova/compute/manager.py

 @object_compat
 @messaging.expected_exceptions(exception.PreserveEphemeralNotSupported)
 @wrap_exception()
 @reverts_task_state
 @wrap_instance_event
 @wrap_instance_fault
 def rebuild_instance(self, context, instance, orig_image_ref, image_ref,
   injected_files, new_pass,
      orig_sys_metadata,
   bdms, recreate, on_shared_storage,
   preserve_ephemeral=False):
    ......

if on_shared_storage != self.driver.instance_on_disk(instance):
  raise exception.InvalidSharedStorage(_("Invalid state of instance files on shared"
    " storage"))

if on_shared_storage:
  LOG.info(_LI('disk on shared storage, recreating using'
         ' existing disk'))
else:
  image_ref = orig_image_ref = instance.image_ref
  LOG.info(_LI("disk not on shared storage, rebuilding from:"
         " '%s'"), str(image_ref))

# NOTE(mriedem): On a recreate (evacuate), we need to update
# the instance's host and node properties to reflect it's
# destination node for the recreate.
node_name = None
try:
  compute_node = self._get_compute_info(context, self.host)
     node_name = compute_node.hypervisor_hostname
except exception.ComputeHostNotFound:
  LOG.exception(_LE('Failed to get compute_info for %s'),self.host)
finally:
  instance.host = self.host
  instance.node = node_name
  instance.save()

if image_ref:
  image_meta = self.image_api.get(context, image_ref)
else:
  image_meta = {}
......

Bellow is my image info
+------------------------------+--------------------------------------+
| Property                     | Value                                |
+------------------------------+--------------------------------------+
| OS-EXT-IMG-SIZE:size         | 53687091200                          |
| created                      | 2016-09-20T08:15:21Z                 |
| id                           | 8b218b4d-74ff-44af-bc4c-c37fb1106b03 |
| metadata hw_disk_bus         | scsi                                 |
| metadata hw_qemu_guest_agent | yes                                  |
| metadata hw_scsi_model       | virtio-scsi                          |
| minDisk                      | 0                                    |
| minRam                       | 0                                    |
| name                         | zptest-20160920                      |
| progress                     | 100                                  |
| status                       | ACTIVE                               |
| updated                      | 2016-10-20T07:38:54Z                 |
+------------------------------+--------------------------------------+

Curently i fixed the problem by update the code:

nova/compute/api.py

    @check_instance_state(vm_state=[vm_states.ACTIVE, vm_states.STOPPED,
                                    vm_states.ERROR])
    def evacuate(self, context, instance, host, on_shared_storage,
                 admin_password=None):
        """Running evacuate to target host.

        Checking vm compute host state, if the host not in expected_state,
        raising an exception.

        :param instance: The instance to evacuate
        :param host: Target host. if not set, the scheduler will pick up one
        :param on_shared_storage: True if instance files on shared storage
        :param admin_password: password to set on rebuilt instance

        """
        LOG.debug('vm evacuation scheduled', instance=instance)
        inst_host = instance.host
        service = objects.Service.get_by_compute_host(context, inst_host)
        if self.servicegroup_api.service_is_up(service):
            LOG.error(_LE('Instance compute service state on %s '
                   expected to be down, but it was up.'), inst_host)
            raise exception.ComputeServiceInUse(host=inst_host)

        instance.task_state = task_states.REBUILDING
        instance.save(expected_task_state=[None])
        self._record_action_start(context, instance, instance_actions.EVACUATE)

        return self.compute_task_api.rebuild_instance(context,
                       instance=instance,
                       new_pass=admin_password,
                       injected_files=None,
                       image_ref=None,
                       orig_image_ref=None,
                       orig_sys_metadata=None,
                       bdms=None,
                       recreate=True,
                       on_shared_storage=on_shared_storage,
                       host=host)

update image_ref=None to it's value,not None.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1635160

Title:
  No bootable device when evacuate a instance on shared_storage_storage
  ceph

Status in OpenStack Compute (nova):
  New

Bug description:
  Nova Verion:nova-kilo-2015.1.1
  Ceph Verion:ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

  When i test nova evacuate function i found after the instance evacuated it cant not boot normally.
  By the vnc console i see "No bootable device" info.

  Through some tests i found the reason: when u used the shared storage
  the rebuild task flow will not get the image meta again. So if u set
  meta for the image, the problem occur.

  The code:
   nova/compute/manager.py

   @object_compat
   @messaging.expected_exceptions(exception.PreserveEphemeralNotSupported)
   @wrap_exception()
   @reverts_task_state
   @wrap_instance_event
   @wrap_instance_fault
   def rebuild_instance(self, context, instance, orig_image_ref, image_ref,
     injected_files, new_pass,
        orig_sys_metadata,
     bdms, recreate, on_shared_storage,
     preserve_ephemeral=False):
      ......

  if on_shared_storage != self.driver.instance_on_disk(instance):
    raise exception.InvalidSharedStorage(_("Invalid state of instance files on shared"
      " storage"))

  if on_shared_storage:
    LOG.info(_LI('disk on shared storage, recreating using'
           ' existing disk'))
  else:
    image_ref = orig_image_ref = instance.image_ref
    LOG.info(_LI("disk not on shared storage, rebuilding from:"
           " '%s'"), str(image_ref))

  # NOTE(mriedem): On a recreate (evacuate), we need to update
  # the instance's host and node properties to reflect it's
  # destination node for the recreate.
  node_name = None
  try:
    compute_node = self._get_compute_info(context, self.host)
       node_name = compute_node.hypervisor_hostname
  except exception.ComputeHostNotFound:
    LOG.exception(_LE('Failed to get compute_info for %s'),self.host)
  finally:
    instance.host = self.host
    instance.node = node_name
    instance.save()

  if image_ref:
    image_meta = self.image_api.get(context, image_ref)
  else:
    image_meta = {}
  ......

  Bellow is my image info
  +------------------------------+--------------------------------------+
  | Property                     | Value                                |
  +------------------------------+--------------------------------------+
  | OS-EXT-IMG-SIZE:size         | 53687091200                          |
  | created                      | 2016-09-20T08:15:21Z                 |
  | id                           | 8b218b4d-74ff-44af-bc4c-c37fb1106b03 |
  | metadata hw_disk_bus         | scsi                                 |
  | metadata hw_qemu_guest_agent | yes                                  |
  | metadata hw_scsi_model       | virtio-scsi                          |
  | minDisk                      | 0                                    |
  | minRam                       | 0                                    |
  | name                         | zptest-20160920                      |
  | progress                     | 100                                  |
  | status                       | ACTIVE                               |
  | updated                      | 2016-10-20T07:38:54Z                 |
  +------------------------------+--------------------------------------+

  Curently i fixed the problem by update the code:

  nova/compute/api.py

      @check_instance_state(vm_state=[vm_states.ACTIVE, vm_states.STOPPED,
                                      vm_states.ERROR])
      def evacuate(self, context, instance, host, on_shared_storage,
                   admin_password=None):
          """Running evacuate to target host.

          Checking vm compute host state, if the host not in expected_state,
          raising an exception.

          :param instance: The instance to evacuate
          :param host: Target host. if not set, the scheduler will pick up one
          :param on_shared_storage: True if instance files on shared storage
          :param admin_password: password to set on rebuilt instance

          """
          LOG.debug('vm evacuation scheduled', instance=instance)
          inst_host = instance.host
          service = objects.Service.get_by_compute_host(context, inst_host)
          if self.servicegroup_api.service_is_up(service):
              LOG.error(_LE('Instance compute service state on %s '
                     expected to be down, but it was up.'), inst_host)
              raise exception.ComputeServiceInUse(host=inst_host)

          instance.task_state = task_states.REBUILDING
          instance.save(expected_task_state=[None])
          self._record_action_start(context, instance, instance_actions.EVACUATE)

          return self.compute_task_api.rebuild_instance(context,
                         instance=instance,
                         new_pass=admin_password,
                         injected_files=None,
                         image_ref=None,
                         orig_image_ref=None,
                         orig_sys_metadata=None,
                         bdms=None,
                         recreate=True,
                         on_shared_storage=on_shared_storage,
                         host=host)

  update image_ref=None to it's value,not None.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1635160/+subscriptions


Follow ups