← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1251123] Re: _update_user_list_with_cas causes significant overhead (when using memcached as token store backend)

 

** Changed in: keystone
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Keystone.
https://bugs.launchpad.net/bugs/1251123

Title:
  _update_user_list_with_cas causes significant overhead (when using
  memcached as token store backend)

Status in OpenStack Identity (Keystone):
  Fix Released
Status in Keystone havana series:
  Fix Released

Bug description:
  [Problem statement]
  In Havana, when using memcached as the backend of token store, we have been seeing significant performance drop by comparison with Grizzly. 

  [How to reproduce]
  We used a Python script to boot VMs at the rate of 1 VM per second. We have seen a lot of VM creation failed and the Keystone-all process's CPU utilization was nearly 100%. 

  [Analysis]
  When using memcached as token's backend, keystone stores two types of K-V pairs into memcached.
     
     token_id ===> token data (associated with an TTL)

     user_id  ===> a list of ids for tokens that belong to the user

  When creating a new token, Keystone first adds the (token_id, data)
  pair into memcahce, and then update the (user_id, token_id_list) pair
  in function _update_user_list_with_cas.

  What _update_user_list_with_cas does are:
      1. retrieve the old list
      2. for each token_id in the old list, retrieve the token data to check whether it is expired or not.
      3. discard the expired tokens, add the valid token_ids to a new list
      4. append the newly created token's id to the new list too.
      5. use memcached's Compare-And-Set function to replace the old list with the new list

  In practice we have found it is very usual that a user have thousands
  of valid tokens at a given moment, so the step 2 consumes a lot of
  time. What's worse is that CAS tends to end up with failure and retry,
  which makes this function even less efficient.

  [Proposed fix]
  I'd like to propose a 'lazy cleanup of expired token_ids from the user list' solution.

  The idea is to avoid doing the clean up EVERY TIME when a new token is
  created. We can set a dynamic threshold T for each user, and cleanup
  job will be triggered only when the number of token_ids exceeds the
  threshold T. After every cleanup, it will check how many token_ids
  have been cleaned up, if the percentage is lower than a pre-specified
  P, than the T needs to be increased to T*(1+P) to avoid too frequent
  clean-ups.

  Besides, every time the list_tokens function for a given user is
  called, it will always trigger a clean-up action. It is necessary to
  ensure list_tokens always return valid tokens only.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1251123/+subscriptions