← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1915710] [NEW] Doesn't download image from _base folder

 

Public bug reported:

Hello.

Recently I encountered with strange behavior of nova while upgrading
instance to a new flavor.

I did:
1. I created an instance from a previously made snapshot.
2. After the instance has been successfully created, I deleted the snapshot from image storage.
3. Created a new flavor with a bigger of disk space for the instance.
4. I made resize the instance using a new flavor, with case when there are not enough resources on the src compute node and the instance started migrating to another compute node where there are enough resources.
5. Once the instance disk has been completely copied to the new compute and the instance status has changed to VERITY_RESZIE, the instance disk (block device) remains the same size as in the old flavor. After confirming the upgrade, the disk remains the same size.

In the logs, I see a message like
https://github.com/openstack/nova/blob/stable/queens/nova/virt/libvirt/driver.py#L7570
and at the same time there is no image in the _base folder on the new
compute node, the image remains on the old compute node until it is
removed because its age is greater than the
remove_unused_original_minimum_age_seconds in nova.conf.

I made the following edits in the code:
/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py	
@@ -7676,13 +7676,13 @@
                       {'image_id': image_id, 'host': fallback_from_host},
                       instance=instance)
-            def copy_from_host(target):
+            def copy_from_host(target, image_id):
                 libvirt_utils.copy_image(src=target,
                                          dest=target,
                                          host=fallback_from_host,
                                          receive=True)
             image.cache(fetch_func=copy_from_host, size=size,
-                        filename=filename)
+                        filename=filename, image_id=image_id)

And after that, everything worked as expected. The instance successfully
migrated to dst compute during the upgrade, the disk grew according to
the new flavor.

But then I ran into one more issue, when the instance goes into the
VERIFY_RESIZE status in the database already indicated that it is
running on a different compute node and the image on the scr compute can
be deleted due to remove_unused_original_minimum_age_seconds. After
that, if you do revert, the image is not copied back to src compute.

I made the following edits in the code:
+++ /usr/lib/python2.7/dist-packages/nova/compute/manager.py	
@@ -7932,7 +7932,21 @@
                    'host': nodes}
         filtered_instances = objects.InstanceList.get_by_filters(context,
                                  filters, expected_attrs=[], use_slave=True)
+        filters_migration_instances = {'status': [ 'finished', 'post-migrating'],
+                    'source_compute': nodes,
+                    'migration_type': 'resize'}
+        migration_list_instances = objects.MigrationList.get_by_filters(context, filters=filters_migration_instances)
+        if migration_list_instances:
+            uuids = []
+            for migration_instance in migration_list_instances:
+                uuids.append(migration_instance['instance_uuid'])
+            filters_by_instance = {'uuid': uuids}
+            filtered_migration_instances = objects.InstanceList.get_by_filters(context, filters=filters_by_instance, expected_attrs=[], use_slave=True)
+            filtered_instances = filtered_instances + filtered_migration_instances
         self.driver.manage_image_cache(context, filtered_instances)

Thus, we will not delete the images of instances that are still in the
VERIFY_RESIZE status.

I would like to clarify if this is the basic behavior of nova or is it
still a bug? Perhaps someone else has encountered similar problems.

Environment
OS - Ubuntu 16.04
Linux compute1 4.15.0-122-generic #124~16.04.1-Ubuntu SMP Thu Oct 15 16:08:36 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

OpenStack release - Queens
hypervisor - Libvirt + KVM

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1915710

Title:
  Doesn't download image from _base folder

Status in OpenStack Compute (nova):
  New

Bug description:
  Hello.

  Recently I encountered with strange behavior of nova while upgrading
  instance to a new flavor.

  I did:
  1. I created an instance from a previously made snapshot.
  2. After the instance has been successfully created, I deleted the snapshot from image storage.
  3. Created a new flavor with a bigger of disk space for the instance.
  4. I made resize the instance using a new flavor, with case when there are not enough resources on the src compute node and the instance started migrating to another compute node where there are enough resources.
  5. Once the instance disk has been completely copied to the new compute and the instance status has changed to VERITY_RESZIE, the instance disk (block device) remains the same size as in the old flavor. After confirming the upgrade, the disk remains the same size.

  In the logs, I see a message like
  https://github.com/openstack/nova/blob/stable/queens/nova/virt/libvirt/driver.py#L7570
  and at the same time there is no image in the _base folder on the new
  compute node, the image remains on the old compute node until it is
  removed because its age is greater than the
  remove_unused_original_minimum_age_seconds in nova.conf.

  I made the following edits in the code:
  /usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py	
  @@ -7676,13 +7676,13 @@
                         {'image_id': image_id, 'host': fallback_from_host},
                         instance=instance)
  -            def copy_from_host(target):
  +            def copy_from_host(target, image_id):
                   libvirt_utils.copy_image(src=target,
                                            dest=target,
                                            host=fallback_from_host,
                                            receive=True)
               image.cache(fetch_func=copy_from_host, size=size,
  -                        filename=filename)
  +                        filename=filename, image_id=image_id)

  And after that, everything worked as expected. The instance
  successfully migrated to dst compute during the upgrade, the disk grew
  according to the new flavor.

  But then I ran into one more issue, when the instance goes into the
  VERIFY_RESIZE status in the database already indicated that it is
  running on a different compute node and the image on the scr compute
  can be deleted due to remove_unused_original_minimum_age_seconds.
  After that, if you do revert, the image is not copied back to src
  compute.

  I made the following edits in the code:
  +++ /usr/lib/python2.7/dist-packages/nova/compute/manager.py	
  @@ -7932,7 +7932,21 @@
                      'host': nodes}
           filtered_instances = objects.InstanceList.get_by_filters(context,
                                    filters, expected_attrs=[], use_slave=True)
  +        filters_migration_instances = {'status': [ 'finished', 'post-migrating'],
  +                    'source_compute': nodes,
  +                    'migration_type': 'resize'}
  +        migration_list_instances = objects.MigrationList.get_by_filters(context, filters=filters_migration_instances)
  +        if migration_list_instances:
  +            uuids = []
  +            for migration_instance in migration_list_instances:
  +                uuids.append(migration_instance['instance_uuid'])
  +            filters_by_instance = {'uuid': uuids}
  +            filtered_migration_instances = objects.InstanceList.get_by_filters(context, filters=filters_by_instance, expected_attrs=[], use_slave=True)
  +            filtered_instances = filtered_instances + filtered_migration_instances
           self.driver.manage_image_cache(context, filtered_instances)

  Thus, we will not delete the images of instances that are still in the
  VERIFY_RESIZE status.

  I would like to clarify if this is the basic behavior of nova or is it
  still a bug? Perhaps someone else has encountered similar problems.

  Environment
  OS - Ubuntu 16.04
  Linux compute1 4.15.0-122-generic #124~16.04.1-Ubuntu SMP Thu Oct 15 16:08:36 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  OpenStack release - Queens
  hypervisor - Libvirt + KVM

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1915710/+subscriptions


Follow ups