← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1816927] Re: Deployments with high churn are susceptible to false positives with token validation

 

Reviewed:  https://review.openstack.org/638397
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=261eeaa19bb4c9e9ea89fac28e473fa44c4a55de
Submitter: Zuul
Branch:    master

commit 261eeaa19bb4c9e9ea89fac28e473fa44c4a55de
Author: Pavlo Shchelokovskyy <shchelokovskyy@xxxxxxxxx>
Date:   Thu Feb 21 13:06:10 2019 +0200

    Add hint for order of keys during distribution
    
    If the new primary key is not the first to be distributed after fernet
    key rotation, there may be a small time window during the key
    distribution when tokens issued by the node where fernet rotation was
    performed can not be validated on the node where keys are being
    distributed to.
    
    Change-Id: I34b5cadd12815ee95c71d8c163504390a9e5e343
    Closes-Bug: #1816927


** Changed in: keystone
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1816927

Title:
  Deployments with high churn are susceptible to false positives with
  token validation

Status in OpenStack Identity (keystone):
  Fix Released
Status in openstack-ansible:
  Fix Committed

Bug description:
  The implementation for fernet tokens relies on symmetric encryption.
  This underpinning requires that each keystone API node "share" the
  same key repository, specifically in deployments where keystone
  servers need to validate tokens issued by one another (e.g., a cluster
  of keystone servers behind an HA proxy).

  Without getting into too much detail, each key repository consists of
  a set of files on disk. The naming of each file is crucial because it
  denotes the type of key it is (documented extensively [0]). Each file
  name corresponds to an integer. The file name with the highest index
  is used to encrypt new tokens, which is called the primary key. The
  file name with the lowest index, or 0, is known as a staged key and it
  is always promoted to be the primary key on the next rotation. Every
  other key in the repository is a secondary key and is only used to
  decrypt tokens. Each key on disk goes through a lifecycle, starting as
  a staged key, promoted to a primary key, eventually being demoted to a
  secondary key. Note that keystone does *not* handle key distribution
  between API servers. We recommend this be done using configuration
  management. The documentation suggests rsync as one possible utility
  to keep key repositories in sync.

  I'm opening this bug because it was brought to our attention that
  keystone servers may respond with a 401 Invalid Fernet token, in
  deployments with high churn, or high token load, across a cluster of
  keystone nodes.

  The issue is that in the process of key rotation, the staged key is
  promoted to be the primary key. As soon as this happens, any
  subsequent requests to create tokens will use the primary key to
  encrypt the token. It is assumed all other API servers have a copy of
  this key, because it's the staged key and also valid as a secondard
  key. A token encrypted with the new primary key should be validatable
  on other API servers if they have a copy of the staged key, which has
  the same key contents as the new primary key on the API server that
  initiated the token rotation. The rsync implementation deletes the
  contents of the key repository and rebuilds it, alphanumerically. This
  results in the staged key always being written by rsync first, because
  its file name is 0. The primary key is always written last, because
  its filename is the highest index of the key repository.

  A unique timing event where:

  - a token is created after key rotation, but before key distribution
  - key distribution is invoked using a mechanism like rsync
  - token validation is performed on the API server getting its key repository built by rsync
  - the token is validated before the new primary key is written to the key repository by rsync, and fails validation because the key repository doesn't contain the key used to encrypt the token

  A subsequent request to validate the token should succeed if rsync
  completes successfully.

  pas-ha brought this to the #openstack-keystone channel as an issue
  that was affecting an internal CI/CD deployment that has a lot of
  churn [1].

  [0] https://docs.openstack.org/keystone/latest/admin/fernet-token-faq.html#what-are-the-different-types-of-keys
  [1] http://eavesdrop.openstack.org/irclogs/%23openstack-keystone/%23openstack-keystone.2019-02-20.log.html#t2019-02-20T20:11:12

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1816927/+subscriptions


References