← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2073945] Re: Copying of image to another store initiated by compute fails in edge deployment

 

Reviewed:  https://review.opendev.org/c/openstack/glance/+/924824
Committed: https://opendev.org/openstack/glance/commit/ee7e96f06af741bb34bedac18fa2c4616fcc3905
Submitter: "Zuul (22348)"
Branch:    master

commit ee7e96f06af741bb34bedac18fa2c4616fcc3905
Author: Rajat Dhasmana <rajatdhasmana@xxxxxxxxx>
Date:   Wed Jul 24 07:31:46 2024 +0000

    Do not set_acls if store is not associated to glance node
    
    In case of glance multiple stores (mostly for ceph) nova initiates
    copy-image functionality if image, from which the sever is being
    created, is not present in the referring ceph store. This can fail if
    image location which is already present in image but not available for
    that glance edge node. This scenario can only be reproducible
    with EDGE deployment.
    
    In order to fix this, if the store is defined on that glance node
    then only call set_acls method, else ignore it.
    
    Closes-Bug: #2073945
    Change-Id: I0409982ae27b662e60dd2363ba2f7863d0722fea


** Changed in: glance
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/2073945

Title:
  Copying of image to another store initiated by compute  fails in edge
  deployment

Status in Glance:
  Fix Released

Bug description:
  In case of glance multiple stores (mostly for ceph) nova initiates
  copy-image functionality if image from which sever is being created is
  not present in the referring ceph store. This can fail if image
  location which is already present in image but not available for that
  glance edge node. This scenario can only be reproducible with EDGE
  deployment.

  
  Environment details:
  1 Central node (az1), 2 Edge nodes (az2, az3)
  3 availability zones for nova (az1,az2,az3)

  glance-api.conf for az1:

  [DEFAULT]
  enabled_import_methods=[web-download,copy-image,glance-direct]
  enabled_backends = az1:rbd,az2:rbd,az3:rbd

  [glance_store]
  default_backend = az1

  [az1]
  rbd_store_ceph_conf = /etc/ceph/ceph-az1.conf
  store_description = "az1 RBD backend"
  rbd_store_pool = glance
  rbd_store_user = openstack
  rbd_thin_provisioning = True

  [az2]
  rbd_store_ceph_conf = /etc/ceph/ceph-az2.conf
  store_description = "az2 RBD backend"
  rbd_store_pool = glance
  rbd_store_user = openstack
  rbd_thin_provisioning = True

  [az3]
  rbd_store_ceph_conf = /etc/ceph/ceph-az3.conf
  store_description = "az3 RBD backend"
  rbd_store_pool = glance
  rbd_store_user = openstack
  rbd_thin_provisioning = True

  
  glance-api.conf for az2:

  [DEFAULT]
  enabled_backends = az2:rbd,az1:rbd
  enabled_import_methods=[web-download,copy-image,glance-direct]

  [glance_store]
  default_backend = az2

  [az2]
  rbd_store_ceph_conf = /etc/ceph/ceph-az2.conf
  store_description = "az2 RBD backend"
  rbd_store_pool = glance
  rbd_store_user = openstack
  rbd_thin_provisioning = True

  [az1]
  rbd_store_ceph_conf = /etc/ceph/ceph-az1.conf
  store_description = "az1 RBD backend"
  rbd_store_pool = glance
  rbd_store_user = openstack
  rbd_thin_provisioning = True

  glance-api.conf for az3:

  [DEFAULT]
  enabled_backends = az3:rbd,az1:rbd
  enabled_import_methods=[web-download,copy-image,glance-direct]

  [glance_store]
  default_backend = az3

  [az3]
  rbd_store_ceph_conf = /etc/ceph/ceph-az3.conf
  store_description = "az3 RBD backend"
  rbd_store_pool = glance
  rbd_store_user = openstack
  rbd_thin_provisioning = True

  [az1]
  rbd_store_ceph_conf = /etc/ceph/ceph-az1.conf
  store_description = "az1 RBD backend"
  rbd_store_pool = glance
  rbd_store_user = openstack
  rbd_thin_provisioning = True

  
  Steps to reproduce:
  1. Create image in az1 ceph store (which is central node)
     glance image-create --disk-format raw --container-format bare --file cirros.raw --name Test-AZ1

  2. Now boot an instance with image in availability zone 3 (Here nova will initaite copy-image from az1 to az3)
     openstack server create --flavor c1 $IMG_SRC --nic net-id=private $VM_NAME --availability-zone az3 Instance-AZ3

  3. See that image is copied to az3 ceph backend
     glance image-list --include-stores

  4. Now boot an instance with the same image in availability zone 2 (Here nova will initaite copy-image to copy it to az2)
     openstack server create --flavor c1 $IMG_SRC --nic net-id=private $VM_NAME --availability-zone az2 Instance-AZ2

  Expected result:
      Instance should be in active state and image now should be available in az2 ceph backend as well

  Actual result:
      Instance is active, but image fails to be copied to az2 ceph backend with erorr `glance_store.exceptions.UnknownScheme: Unknown scheme 'az3' found in URI` and it takes lot of time to boot the instance than step 2

  
  Copying fails with below stacktrace: 

  2024-07-23 15:07:20.599 65 INFO eventlet.wsgi.server [None req-e71ddbbe-0a4e-4d01-a7c6-2e6726172e4c 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] 50.50.8.2,::1 - - [23/Jul/2024 15:07:20] "GET /healthcheck HTTP/1.1" 200 142 0.001121
  2024-07-23 15:07:38.835 65 WARNING glance.common.store_utils [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] Invalid location uri rbd://f8346b50-138b-11ef-b48e-b49691fe9877/glance/b89b5eda-a829-4f68-91be-0938945753df/snap
  2024-07-23 15:07:38.836 65 INFO eventlet.wsgi.server [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] 100.64.0.9,::1 - - [23/Jul/2024 15:07:38] "GET /v2/images/b89b5eda-a829-4f68-91be-0938945753df HTTP/1.1" 200 1461 0.089335
  2024-07-23 15:07:38.910 65 INFO eventlet.wsgi.server [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] 100.64.0.9,::1 - - [23/Jul/2024 15:07:38] "GET /v2/schemas/image HTTP/1.1" 200 6283 0.070413
  2024-07-23 15:07:39.019 65 WARNING glance.common.store_utils [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] Invalid location uri rbd://f8346b50-138b-11ef-b48e-b49691fe9877/glance/b89b5eda-a829-4f68-91be-0938945753df/snap
  2024-07-23 15:07:39.059 65 INFO eventlet.wsgi.server [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] 100.64.0.9,::1 - - [23/Jul/2024 15:07:39] "POST /v2/images/b89b5eda-a829-4f68-91be-0938945753df/import HTTP/1.1" 202 211 0.110612
  2024-07-23 15:07:39.062 65 INFO glance.domain [-] Task [d05fb139-f98e-4ba7-8888-43955b27f5a6] status changing from pending to processing
  2024-07-23 15:07:39.093 65 WARNING glance.common.store_utils [-] Invalid location uri rbd://f8346b50-138b-11ef-b48e-b49691fe9877/glance/b89b5eda-a829-4f68-91be-0938945753df/snap
  2024-07-23 15:07:39.100 65 WARNING glance.common.store_utils [-] Invalid location uri rbd://f8346b50-138b-11ef-b48e-b49691fe9877/glance/b89b5eda-a829-4f68-91be-0938945753df/snap
  2024-07-23 15:07:39.129 65 WARNING glance.common.store_utils [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] Invalid location uri rbd://f8346b50-138b-11ef-b48e-b49691fe9877/glance/b89b5eda-a829-4f68-91be-0938945753df/snap
  2024-07-23 15:07:39.130 65 INFO eventlet.wsgi.server [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] 100.64.0.9,::1 - - [23/Jul/2024 15:07:39] "GET /v2/images/b89b5eda-a829-4f68-91be-0938945753df HTTP/1.1" 200 1529 0.067387
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor [-] Task initialization failed: Unknown scheme 'az3' found in URI: glance_store.exceptions.UnknownScheme: Unknown scheme 'az3' found in URI
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor Traceback (most recent call last):
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/glance_store/location.py", line 111, in get_location_from_uri_and_backend
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     scheme_info = SCHEME_TO_CLS_BACKEND_MAP[pieces.scheme][backend]
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor KeyError: 'az3'
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor 
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor During handling of the above exception, another exception occurred:
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor 
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor Traceback (most recent call last):
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/glance/async_/taskflow_executor.py", line 134, in _get_flow
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     return driver.DriverManager('glance.flows', task.type,
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/stevedore/driver.py", line 54, in __init__
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     super(DriverManager, self).__init__(
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/stevedore/named.py", line 78, in __init__
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     extensions = self._load_plugins(invoke_on_load,
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/stevedore/extension.py", line 218, in _load_plugins
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     self._on_load_failure_callback(self, ep, err)
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/stevedore/extension.py", line 206, in _load_plugins
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     ext = self._load_one_plugin(ep,
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/stevedore/named.py", line 156, in _load_one_plugin
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     return super(NamedExtensionManager, self)._load_one_plugin(
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/stevedore/extension.py", line 242, in _load_one_plugin
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     obj = plugin(*invoke_args, **invoke_kwds)
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/glance/async_/flows/api_image_import.py", line 1003, in get_flow
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     action.pop_extra_property('os_glance_stage_host')
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/glance/async_/flows/api_image_import.py", line 172, in __exit__
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     self._image_repo.save(self._image, self._image_previous_status)
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/glance/notifier.py", line 530, in save
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     super(ImageRepoProxy, self).save(image, from_state=from_state)
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/glance/domain/proxy.py", line 99, in save
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     result = self.base.save(base_item, from_state=from_state)
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/glance/quota/__init__.py", line 121, in save
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     return super(ImageRepoProxy, self).save(image, from_state=from_state)
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/glance/domain/proxy.py", line 99, in save
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     result = self.base.save(base_item, from_state=from_state)
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/glance/location.py", line 83, in save
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     self._set_acls(image)
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/glance/location.py", line 66, in _set_acls
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     self.store_api.set_acls_for_multi_store(
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/glance_store/multi_backend.py", line 539, in set_acls_for_multi_store
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     loc = location.get_location_from_uri_and_backend(
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor   File "/usr/lib/python3.9/site-packages/glance_store/location.py", line 113, in get_location_from_uri_and_backend
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor     raise exceptions.UnknownScheme(scheme=backend)
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor glance_store.exceptions.UnknownScheme: Unknown scheme 'az3' found in URI
  2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor 
  2024-07-23 15:07:39.191 65 INFO eventlet.wsgi.server [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] 100.64.0.9,::1 - - [23/Jul/2024 15:07:39] "GET /v2/schemas/image HTTP/1.1" 200 6283 0.047085

  Analysis:
  Before starting copy-operation, glance tries to set some attributes to image to track the state of the import operation. While saving these changes to image it tries to call set_acls method on each store instnace [1] for each available location in the image. Now in our case image has two locations till now az1 and az3, since we (nova) is trying to copy image now
  to az2 and the operation will run on glance-az2 edge node, it will not have the az3 store defined in its configuration file. This will cause the failure with `Unknown scheme 'az3' found in URI`.

  Solution:
  In order to fix this, we can check if the store is defined on that glance node then only call set_acls method, else ignore it.

  [1]
  https://github.com/openstack/glance/blob/master/glance/location.py#L66

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/2073945/+subscriptions



References