yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #08137
[Bug 1251123] Re: _update_user_list_with_cas causes significant overhead (when using memcached as token store backend)
** Also affects: keystone/havana
Importance: Undecided
Status: New
** Changed in: keystone/havana
Status: New => Triaged
** Changed in: keystone/havana
Importance: Undecided => Medium
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Keystone.
https://bugs.launchpad.net/bugs/1251123
Title:
_update_user_list_with_cas causes significant overhead (when using
memcached as token store backend)
Status in OpenStack Identity (Keystone):
Triaged
Status in Keystone havana series:
Triaged
Bug description:
[Problem statement]
In Havana, when using memcached as the backend of token store, we have been seeing significant performance drop by comparison with Grizzly.
[How to reproduce]
We used a Python script to boot VMs at the rate of 1 VM per second. We have seen a lot of VM creation failed and the Keystone-all process's CPU utilization was nearly 100%.
[Analysis]
When using memcached as token's backend, keystone stores two types of K-V pairs into memcached.
token_id ===> token data (associated with an TTL)
user_id ===> a list of ids for tokens that belong to the user
When creating a new token, Keystone first adds the (token_id, data)
pair into memcahce, and then update the (user_id, token_id_list) pair
in function _update_user_list_with_cas.
What _update_user_list_with_cas does are:
1. retrieve the old list
2. for each token_id in the old list, retrieve the token data to check whether it is expired or not.
3. discard the expired tokens, add the valid token_ids to a new list
4. append the newly created token's id to the new list too.
5. use memcached's Compare-And-Set function to replace the old list with the new list
In practice we have found it is very usual that a user have thousands
of valid tokens at a given moment, so the step 2 consumes a lot of
time. What's worse is that CAS tends to end up with failure and retry,
which makes this function even less efficient.
[Proposed fix]
I'd like to propose a 'lazy cleanup of expired token_ids from the user list' solution.
The idea is to avoid doing the clean up EVERY TIME when a new token is
created. We can set a dynamic threshold T for each user, and cleanup
job will be triggered only when the number of token_ids exceeds the
threshold T. After every cleanup, it will check how many token_ids
have been cleaned up, if the percentage is lower than a pre-specified
P, than the T needs to be increased to T*(1+P) to avoid too frequent
clean-ups.
Besides, every time the list_tokens function for a given user is
called, it will always trigger a clean-up action. It is necessary to
ensure list_tokens always return valid tokens only.
To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1251123/+subscriptions