yahoo-eng-team team mailing list archive

Thread
Date
[Bug 2007982] [NEW] keystone role cache misbehaving in HA setup

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: "eblock@xxxxxx" <2007982@xxxxxxxxxxxxxxxxxx>
Date: Tue, 21 Feb 2023 14:09:34 -0000
Reply-to: Bug 2007982 <2007982@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx
Public bug reported:

Following up on two mailing list reports [1][2] which seem to have the same root cause. 
In a HA setup with 3 control nodes (Victoria, baremetal) terraform is used to deploy lots of different k8s clusters (and other stuff). We noticed keystone errors when a project is purged with terraform (cleanly) and a redeployment of the same project (with the same name) is started immediately after that. We did some tests to find out which exact keystone cache it is and it seems to be the role cache (default 600 seconds) which leads to an error in terraform, it reports that the project was not found and refers to the previous ID of the project which is already deleted from the database during the project cleanup.
The same deployment works in an identical cloud version except with only one control node, it just works although the cache is enabled as well.
I already tried to reduce the cache_time to 30 seconds but that doesn't help (although it takes more than 30 seconds until terraform is ready after the prechecks). I also disabled the role cache entirely which helps with the faster redeployment but the downside of disabling it leads to significantly longer response times when using the dashboard or querying the APIs.
Is there any way to tune the role cache in a way so we could have both a reasonable performance as well as being able to redeploy projects without a "sleep 600"?

Storage back end is Ceph (Pacific), keystone versions are:

control01:~ # rpm -qa | grep keystone
python3-keystonemiddleware-9.1.0-lp152.3.20.noarch
python3-keystone-18.0.1~dev11-lp152.1.21.noarch
python3-keystoneauth1-4.2.1-lp152.3.19.noarch
python3-keystoneclient-4.1.0-lp152.5.2.noarch
openstack-keystone-18.0.1~dev11-lp152.1.21.noarch

[1] https://lists.openstack.org/pipermail/openstack-discuss/2022-November/031122.html
[2] https://lists.openstack.org/pipermail/openstack-discuss/2023-February/032258.html

** Affects: keystone
     Importance: Undecided
         Status: New

** Project changed: nova => keystone

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2007982

Title:
  keystone role cache misbehaving in HA setup

Status in OpenStack Identity (keystone):
  New

Bug description:
  Following up on two mailing list reports [1][2] which seem to have the same root cause. 
  In a HA setup with 3 control nodes (Victoria, baremetal) terraform is used to deploy lots of different k8s clusters (and other stuff). We noticed keystone errors when a project is purged with terraform (cleanly) and a redeployment of the same project (with the same name) is started immediately after that. We did some tests to find out which exact keystone cache it is and it seems to be the role cache (default 600 seconds) which leads to an error in terraform, it reports that the project was not found and refers to the previous ID of the project which is already deleted from the database during the project cleanup.
  The same deployment works in an identical cloud version except with only one control node, it just works although the cache is enabled as well.
  I already tried to reduce the cache_time to 30 seconds but that doesn't help (although it takes more than 30 seconds until terraform is ready after the prechecks). I also disabled the role cache entirely which helps with the faster redeployment but the downside of disabling it leads to significantly longer response times when using the dashboard or querying the APIs.
  Is there any way to tune the role cache in a way so we could have both a reasonable performance as well as being able to redeploy projects without a "sleep 600"?

  Storage back end is Ceph (Pacific), keystone versions are:

  control01:~ # rpm -qa | grep keystone
  python3-keystonemiddleware-9.1.0-lp152.3.20.noarch
  python3-keystone-18.0.1~dev11-lp152.1.21.noarch
  python3-keystoneauth1-4.2.1-lp152.3.19.noarch
  python3-keystoneclient-4.1.0-lp152.5.2.noarch
  openstack-keystone-18.0.1~dev11-lp152.1.21.noarch

  [1] https://lists.openstack.org/pipermail/openstack-discuss/2022-November/031122.html
  [2] https://lists.openstack.org/pipermail/openstack-discuss/2023-February/032258.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/2007982/+subscriptions