yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #94312
[Bug 2073945] Re: Copying of image to another store initiated by compute fails in edge deployment
Reviewed: https://review.opendev.org/c/openstack/glance/+/924824
Committed: https://opendev.org/openstack/glance/commit/ee7e96f06af741bb34bedac18fa2c4616fcc3905
Submitter: "Zuul (22348)"
Branch: master
commit ee7e96f06af741bb34bedac18fa2c4616fcc3905
Author: Rajat Dhasmana <rajatdhasmana@xxxxxxxxx>
Date: Wed Jul 24 07:31:46 2024 +0000
Do not set_acls if store is not associated to glance node
In case of glance multiple stores (mostly for ceph) nova initiates
copy-image functionality if image, from which the sever is being
created, is not present in the referring ceph store. This can fail if
image location which is already present in image but not available for
that glance edge node. This scenario can only be reproducible
with EDGE deployment.
In order to fix this, if the store is defined on that glance node
then only call set_acls method, else ignore it.
Closes-Bug: #2073945
Change-Id: I0409982ae27b662e60dd2363ba2f7863d0722fea
** Changed in: glance
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/2073945
Title:
Copying of image to another store initiated by compute fails in edge
deployment
Status in Glance:
Fix Released
Bug description:
In case of glance multiple stores (mostly for ceph) nova initiates
copy-image functionality if image from which sever is being created is
not present in the referring ceph store. This can fail if image
location which is already present in image but not available for that
glance edge node. This scenario can only be reproducible with EDGE
deployment.
Environment details:
1 Central node (az1), 2 Edge nodes (az2, az3)
3 availability zones for nova (az1,az2,az3)
glance-api.conf for az1:
[DEFAULT]
enabled_import_methods=[web-download,copy-image,glance-direct]
enabled_backends = az1:rbd,az2:rbd,az3:rbd
[glance_store]
default_backend = az1
[az1]
rbd_store_ceph_conf = /etc/ceph/ceph-az1.conf
store_description = "az1 RBD backend"
rbd_store_pool = glance
rbd_store_user = openstack
rbd_thin_provisioning = True
[az2]
rbd_store_ceph_conf = /etc/ceph/ceph-az2.conf
store_description = "az2 RBD backend"
rbd_store_pool = glance
rbd_store_user = openstack
rbd_thin_provisioning = True
[az3]
rbd_store_ceph_conf = /etc/ceph/ceph-az3.conf
store_description = "az3 RBD backend"
rbd_store_pool = glance
rbd_store_user = openstack
rbd_thin_provisioning = True
glance-api.conf for az2:
[DEFAULT]
enabled_backends = az2:rbd,az1:rbd
enabled_import_methods=[web-download,copy-image,glance-direct]
[glance_store]
default_backend = az2
[az2]
rbd_store_ceph_conf = /etc/ceph/ceph-az2.conf
store_description = "az2 RBD backend"
rbd_store_pool = glance
rbd_store_user = openstack
rbd_thin_provisioning = True
[az1]
rbd_store_ceph_conf = /etc/ceph/ceph-az1.conf
store_description = "az1 RBD backend"
rbd_store_pool = glance
rbd_store_user = openstack
rbd_thin_provisioning = True
glance-api.conf for az3:
[DEFAULT]
enabled_backends = az3:rbd,az1:rbd
enabled_import_methods=[web-download,copy-image,glance-direct]
[glance_store]
default_backend = az3
[az3]
rbd_store_ceph_conf = /etc/ceph/ceph-az3.conf
store_description = "az3 RBD backend"
rbd_store_pool = glance
rbd_store_user = openstack
rbd_thin_provisioning = True
[az1]
rbd_store_ceph_conf = /etc/ceph/ceph-az1.conf
store_description = "az1 RBD backend"
rbd_store_pool = glance
rbd_store_user = openstack
rbd_thin_provisioning = True
Steps to reproduce:
1. Create image in az1 ceph store (which is central node)
glance image-create --disk-format raw --container-format bare --file cirros.raw --name Test-AZ1
2. Now boot an instance with image in availability zone 3 (Here nova will initaite copy-image from az1 to az3)
openstack server create --flavor c1 $IMG_SRC --nic net-id=private $VM_NAME --availability-zone az3 Instance-AZ3
3. See that image is copied to az3 ceph backend
glance image-list --include-stores
4. Now boot an instance with the same image in availability zone 2 (Here nova will initaite copy-image to copy it to az2)
openstack server create --flavor c1 $IMG_SRC --nic net-id=private $VM_NAME --availability-zone az2 Instance-AZ2
Expected result:
Instance should be in active state and image now should be available in az2 ceph backend as well
Actual result:
Instance is active, but image fails to be copied to az2 ceph backend with erorr `glance_store.exceptions.UnknownScheme: Unknown scheme 'az3' found in URI` and it takes lot of time to boot the instance than step 2
Copying fails with below stacktrace:
2024-07-23 15:07:20.599 65 INFO eventlet.wsgi.server [None req-e71ddbbe-0a4e-4d01-a7c6-2e6726172e4c 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] 50.50.8.2,::1 - - [23/Jul/2024 15:07:20] "GET /healthcheck HTTP/1.1" 200 142 0.001121
2024-07-23 15:07:38.835 65 WARNING glance.common.store_utils [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] Invalid location uri rbd://f8346b50-138b-11ef-b48e-b49691fe9877/glance/b89b5eda-a829-4f68-91be-0938945753df/snap
2024-07-23 15:07:38.836 65 INFO eventlet.wsgi.server [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] 100.64.0.9,::1 - - [23/Jul/2024 15:07:38] "GET /v2/images/b89b5eda-a829-4f68-91be-0938945753df HTTP/1.1" 200 1461 0.089335
2024-07-23 15:07:38.910 65 INFO eventlet.wsgi.server [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] 100.64.0.9,::1 - - [23/Jul/2024 15:07:38] "GET /v2/schemas/image HTTP/1.1" 200 6283 0.070413
2024-07-23 15:07:39.019 65 WARNING glance.common.store_utils [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] Invalid location uri rbd://f8346b50-138b-11ef-b48e-b49691fe9877/glance/b89b5eda-a829-4f68-91be-0938945753df/snap
2024-07-23 15:07:39.059 65 INFO eventlet.wsgi.server [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] 100.64.0.9,::1 - - [23/Jul/2024 15:07:39] "POST /v2/images/b89b5eda-a829-4f68-91be-0938945753df/import HTTP/1.1" 202 211 0.110612
2024-07-23 15:07:39.062 65 INFO glance.domain [-] Task [d05fb139-f98e-4ba7-8888-43955b27f5a6] status changing from pending to processing
2024-07-23 15:07:39.093 65 WARNING glance.common.store_utils [-] Invalid location uri rbd://f8346b50-138b-11ef-b48e-b49691fe9877/glance/b89b5eda-a829-4f68-91be-0938945753df/snap
2024-07-23 15:07:39.100 65 WARNING glance.common.store_utils [-] Invalid location uri rbd://f8346b50-138b-11ef-b48e-b49691fe9877/glance/b89b5eda-a829-4f68-91be-0938945753df/snap
2024-07-23 15:07:39.129 65 WARNING glance.common.store_utils [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] Invalid location uri rbd://f8346b50-138b-11ef-b48e-b49691fe9877/glance/b89b5eda-a829-4f68-91be-0938945753df/snap
2024-07-23 15:07:39.130 65 INFO eventlet.wsgi.server [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] 100.64.0.9,::1 - - [23/Jul/2024 15:07:39] "GET /v2/images/b89b5eda-a829-4f68-91be-0938945753df HTTP/1.1" 200 1529 0.067387
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor [-] Task initialization failed: Unknown scheme 'az3' found in URI: glance_store.exceptions.UnknownScheme: Unknown scheme 'az3' found in URI
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor Traceback (most recent call last):
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/glance_store/location.py", line 111, in get_location_from_uri_and_backend
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor scheme_info = SCHEME_TO_CLS_BACKEND_MAP[pieces.scheme][backend]
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor KeyError: 'az3'
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor During handling of the above exception, another exception occurred:
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor Traceback (most recent call last):
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/glance/async_/taskflow_executor.py", line 134, in _get_flow
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor return driver.DriverManager('glance.flows', task.type,
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/stevedore/driver.py", line 54, in __init__
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor super(DriverManager, self).__init__(
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/stevedore/named.py", line 78, in __init__
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor extensions = self._load_plugins(invoke_on_load,
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/stevedore/extension.py", line 218, in _load_plugins
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor self._on_load_failure_callback(self, ep, err)
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/stevedore/extension.py", line 206, in _load_plugins
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor ext = self._load_one_plugin(ep,
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/stevedore/named.py", line 156, in _load_one_plugin
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor return super(NamedExtensionManager, self)._load_one_plugin(
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/stevedore/extension.py", line 242, in _load_one_plugin
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor obj = plugin(*invoke_args, **invoke_kwds)
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/glance/async_/flows/api_image_import.py", line 1003, in get_flow
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor action.pop_extra_property('os_glance_stage_host')
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/glance/async_/flows/api_image_import.py", line 172, in __exit__
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor self._image_repo.save(self._image, self._image_previous_status)
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/glance/notifier.py", line 530, in save
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor super(ImageRepoProxy, self).save(image, from_state=from_state)
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/glance/domain/proxy.py", line 99, in save
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor result = self.base.save(base_item, from_state=from_state)
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/glance/quota/__init__.py", line 121, in save
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor return super(ImageRepoProxy, self).save(image, from_state=from_state)
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/glance/domain/proxy.py", line 99, in save
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor result = self.base.save(base_item, from_state=from_state)
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/glance/location.py", line 83, in save
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor self._set_acls(image)
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/glance/location.py", line 66, in _set_acls
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor self.store_api.set_acls_for_multi_store(
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/glance_store/multi_backend.py", line 539, in set_acls_for_multi_store
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor loc = location.get_location_from_uri_and_backend(
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor File "/usr/lib/python3.9/site-packages/glance_store/location.py", line 113, in get_location_from_uri_and_backend
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor raise exceptions.UnknownScheme(scheme=backend)
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor glance_store.exceptions.UnknownScheme: Unknown scheme 'az3' found in URI
2024-07-23 15:07:39.142 65 ERROR glance.async_.taskflow_executor
2024-07-23 15:07:39.191 65 INFO eventlet.wsgi.server [None req-384360e6-ddc2-45f0-acfc-189e6f5cb0d0 51c9f02e80474e839584fc2b47fcb485 4dc291202b954692a88970dfa5f47f30 - - default default] 100.64.0.9,::1 - - [23/Jul/2024 15:07:39] "GET /v2/schemas/image HTTP/1.1" 200 6283 0.047085
Analysis:
Before starting copy-operation, glance tries to set some attributes to image to track the state of the import operation. While saving these changes to image it tries to call set_acls method on each store instnace [1] for each available location in the image. Now in our case image has two locations till now az1 and az3, since we (nova) is trying to copy image now
to az2 and the operation will run on glance-az2 edge node, it will not have the az3 store defined in its configuration file. This will cause the failure with `Unknown scheme 'az3' found in URI`.
Solution:
In order to fix this, we can check if the store is defined on that glance node then only call set_acls method, else ignore it.
[1]
https://github.com/openstack/glance/blob/master/glance/location.py#L66
To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/2073945/+subscriptions
References