yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #58012
[Bug 1635160] [NEW] No bootable device when evacuate a instance on shared_storage_storage ceph
Public bug reported:
Nova Verion:nova-kilo-2015.1.1
Ceph Verion:ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
When i test nova evacuate function i found after the instance evacuated it cant not boot normally.
By the vnc console i see "No bootable device" info.
Through some tests i found the reason: when u used the shared storage
the rebuild task flow will not get the image meta again. So if u set
meta for the image, the problem occur.
The code:
nova/compute/manager.py
@object_compat
@messaging.expected_exceptions(exception.PreserveEphemeralNotSupported)
@wrap_exception()
@reverts_task_state
@wrap_instance_event
@wrap_instance_fault
def rebuild_instance(self, context, instance, orig_image_ref, image_ref,
injected_files, new_pass,
orig_sys_metadata,
bdms, recreate, on_shared_storage,
preserve_ephemeral=False):
......
if on_shared_storage != self.driver.instance_on_disk(instance):
raise exception.InvalidSharedStorage(_("Invalid state of instance files on shared"
" storage"))
if on_shared_storage:
LOG.info(_LI('disk on shared storage, recreating using'
' existing disk'))
else:
image_ref = orig_image_ref = instance.image_ref
LOG.info(_LI("disk not on shared storage, rebuilding from:"
" '%s'"), str(image_ref))
# NOTE(mriedem): On a recreate (evacuate), we need to update
# the instance's host and node properties to reflect it's
# destination node for the recreate.
node_name = None
try:
compute_node = self._get_compute_info(context, self.host)
node_name = compute_node.hypervisor_hostname
except exception.ComputeHostNotFound:
LOG.exception(_LE('Failed to get compute_info for %s'),self.host)
finally:
instance.host = self.host
instance.node = node_name
instance.save()
if image_ref:
image_meta = self.image_api.get(context, image_ref)
else:
image_meta = {}
......
Bellow is my image info
+------------------------------+--------------------------------------+
| Property | Value |
+------------------------------+--------------------------------------+
| OS-EXT-IMG-SIZE:size | 53687091200 |
| created | 2016-09-20T08:15:21Z |
| id | 8b218b4d-74ff-44af-bc4c-c37fb1106b03 |
| metadata hw_disk_bus | scsi |
| metadata hw_qemu_guest_agent | yes |
| metadata hw_scsi_model | virtio-scsi |
| minDisk | 0 |
| minRam | 0 |
| name | zptest-20160920 |
| progress | 100 |
| status | ACTIVE |
| updated | 2016-10-20T07:38:54Z |
+------------------------------+--------------------------------------+
Curently i fixed the problem by update the code:
nova/compute/api.py
@check_instance_state(vm_state=[vm_states.ACTIVE, vm_states.STOPPED,
vm_states.ERROR])
def evacuate(self, context, instance, host, on_shared_storage,
admin_password=None):
"""Running evacuate to target host.
Checking vm compute host state, if the host not in expected_state,
raising an exception.
:param instance: The instance to evacuate
:param host: Target host. if not set, the scheduler will pick up one
:param on_shared_storage: True if instance files on shared storage
:param admin_password: password to set on rebuilt instance
"""
LOG.debug('vm evacuation scheduled', instance=instance)
inst_host = instance.host
service = objects.Service.get_by_compute_host(context, inst_host)
if self.servicegroup_api.service_is_up(service):
LOG.error(_LE('Instance compute service state on %s '
expected to be down, but it was up.'), inst_host)
raise exception.ComputeServiceInUse(host=inst_host)
instance.task_state = task_states.REBUILDING
instance.save(expected_task_state=[None])
self._record_action_start(context, instance, instance_actions.EVACUATE)
return self.compute_task_api.rebuild_instance(context,
instance=instance,
new_pass=admin_password,
injected_files=None,
image_ref=None,
orig_image_ref=None,
orig_sys_metadata=None,
bdms=None,
recreate=True,
on_shared_storage=on_shared_storage,
host=host)
update image_ref=None to it's value,not None.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1635160
Title:
No bootable device when evacuate a instance on shared_storage_storage
ceph
Status in OpenStack Compute (nova):
New
Bug description:
Nova Verion:nova-kilo-2015.1.1
Ceph Verion:ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
When i test nova evacuate function i found after the instance evacuated it cant not boot normally.
By the vnc console i see "No bootable device" info.
Through some tests i found the reason: when u used the shared storage
the rebuild task flow will not get the image meta again. So if u set
meta for the image, the problem occur.
The code:
nova/compute/manager.py
@object_compat
@messaging.expected_exceptions(exception.PreserveEphemeralNotSupported)
@wrap_exception()
@reverts_task_state
@wrap_instance_event
@wrap_instance_fault
def rebuild_instance(self, context, instance, orig_image_ref, image_ref,
injected_files, new_pass,
orig_sys_metadata,
bdms, recreate, on_shared_storage,
preserve_ephemeral=False):
......
if on_shared_storage != self.driver.instance_on_disk(instance):
raise exception.InvalidSharedStorage(_("Invalid state of instance files on shared"
" storage"))
if on_shared_storage:
LOG.info(_LI('disk on shared storage, recreating using'
' existing disk'))
else:
image_ref = orig_image_ref = instance.image_ref
LOG.info(_LI("disk not on shared storage, rebuilding from:"
" '%s'"), str(image_ref))
# NOTE(mriedem): On a recreate (evacuate), we need to update
# the instance's host and node properties to reflect it's
# destination node for the recreate.
node_name = None
try:
compute_node = self._get_compute_info(context, self.host)
node_name = compute_node.hypervisor_hostname
except exception.ComputeHostNotFound:
LOG.exception(_LE('Failed to get compute_info for %s'),self.host)
finally:
instance.host = self.host
instance.node = node_name
instance.save()
if image_ref:
image_meta = self.image_api.get(context, image_ref)
else:
image_meta = {}
......
Bellow is my image info
+------------------------------+--------------------------------------+
| Property | Value |
+------------------------------+--------------------------------------+
| OS-EXT-IMG-SIZE:size | 53687091200 |
| created | 2016-09-20T08:15:21Z |
| id | 8b218b4d-74ff-44af-bc4c-c37fb1106b03 |
| metadata hw_disk_bus | scsi |
| metadata hw_qemu_guest_agent | yes |
| metadata hw_scsi_model | virtio-scsi |
| minDisk | 0 |
| minRam | 0 |
| name | zptest-20160920 |
| progress | 100 |
| status | ACTIVE |
| updated | 2016-10-20T07:38:54Z |
+------------------------------+--------------------------------------+
Curently i fixed the problem by update the code:
nova/compute/api.py
@check_instance_state(vm_state=[vm_states.ACTIVE, vm_states.STOPPED,
vm_states.ERROR])
def evacuate(self, context, instance, host, on_shared_storage,
admin_password=None):
"""Running evacuate to target host.
Checking vm compute host state, if the host not in expected_state,
raising an exception.
:param instance: The instance to evacuate
:param host: Target host. if not set, the scheduler will pick up one
:param on_shared_storage: True if instance files on shared storage
:param admin_password: password to set on rebuilt instance
"""
LOG.debug('vm evacuation scheduled', instance=instance)
inst_host = instance.host
service = objects.Service.get_by_compute_host(context, inst_host)
if self.servicegroup_api.service_is_up(service):
LOG.error(_LE('Instance compute service state on %s '
expected to be down, but it was up.'), inst_host)
raise exception.ComputeServiceInUse(host=inst_host)
instance.task_state = task_states.REBUILDING
instance.save(expected_task_state=[None])
self._record_action_start(context, instance, instance_actions.EVACUATE)
return self.compute_task_api.rebuild_instance(context,
instance=instance,
new_pass=admin_password,
injected_files=None,
image_ref=None,
orig_image_ref=None,
orig_sys_metadata=None,
bdms=None,
recreate=True,
on_shared_storage=on_shared_storage,
host=host)
update image_ref=None to it's value,not None.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1635160/+subscriptions
Follow ups