yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1905739] [NEW] Performance issue when validating tokens

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Jose Castro Leon <1905739@xxxxxxxxxxxxxxxxxx>
Date: Thu, 26 Nov 2020 14:36:43 -0000
Reply-to: Bug 1905739 <1905739@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

The revoke_token model when validating a token, it always does a lookup
in the database for the possible revocation events that it may be
affected with.

In our production setup with 10k requests per minute and 12k revocation
events stored in the database, this takes quite some time in this
operation while consuming CPU resources.

Many of our programatic clients reuse the token to do subsequent
operations, and even if we have caching configured in all our APIs,
quite of those requests arrive to our keystone servers producing CPU
load.

We did an initial investigation and it seems that just by introducing
caching on [1], it allows us to cut up to 100ms on our average response
time. It also benefits the distribution of the response time, P99 goes
from 3.55 to 1.55 over a day of gathering metrics.

[1]
https://opendev.org/openstack/keystone/src/branch/master/keystone/revoke/core.py#L125

** Affects: keystone
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1905739

Title:
  Performance issue when validating tokens

Status in OpenStack Identity (keystone):
  New

Bug description:
  The revoke_token model when validating a token, it always does a
  lookup in the database for the possible revocation events that it may
  be affected with.

  In our production setup with 10k requests per minute and 12k
  revocation events stored in the database, this takes quite some time
  in this operation while consuming CPU resources.

  Many of our programatic clients reuse the token to do subsequent
  operations, and even if we have caching configured in all our APIs,
  quite of those requests arrive to our keystone servers producing CPU
  load.

  We did an initial investigation and it seems that just by introducing
  caching on [1], it allows us to cut up to 100ms on our average
  response time. It also benefits the distribution of the response time,
  P99 goes from 3.55 to 1.55 over a day of gathering metrics.

  [1]
  https://opendev.org/openstack/keystone/src/branch/master/keystone/revoke/core.py#L125

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1905739/+subscriptions