← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1819957] [NEW] Caching with stale data when a server disconnects due to network partition and reconnects

 

*** This bug is a security vulnerability ***

Public security bug reported:

The flush_on_reconnect optional flag is not used. This can cause stale
data to be utilized from a cache server that disconnected due to a
network partition. This has security concerns as follows:

1*  Password changes/user changes may be reverted for the cache TTL
1a* User may get locked out if PCI-DSS is on and the password change happens during the network
    partition.
2*  Grant changes may be reverted for the cache TTL
3*  Resources (all types) may become "undeleted" for the cache TTL
4*  Tokens (KSM) may become valid again during the cache TTL


As noted in the python-memcached library:

    @param flush_on_reconnect: optional flag which prevents a
            scenario that can cause stale data to be read: If there's more
            than one memcached server and the connection to one is
            interrupted, keys that mapped to that server will get
            reassigned to another. If the first server comes back, those
            keys will map to it again. If it still has its data, get()s
            can read stale data that was overwritten on another
            server. This flag is off by default for backwards
            compatibility.

The solution is to explicitly pass flush_on_reconnect as an optional
argument. A concern with this model is that the memcached servers may be
utilized by other tooling and may lose cache state (in the case the
oslo.cache connection is the only thing affected by the network
partitioning).

This similarly needs to be addressed in pymemcache when it is utilized
in lieu of python-memcached.

** Affects: keystone
     Importance: High
     Assignee: Morgan Fainberg (mdrnstm)
         Status: New

** Affects: keystonemiddleware
     Importance: High
     Assignee: Morgan Fainberg (mdrnstm)
         Status: New

** Affects: oslo.cache
     Importance: High
     Assignee: Morgan Fainberg (mdrnstm)
         Status: New


** Tags: caching security

** Also affects: keystonemiddleware
   Importance: Undecided
       Status: New

** Also affects: oslo.cache
   Importance: Undecided
       Status: New

** Tags added: caching security

** Changed in: keystone
   Importance: Undecided => High

** Changed in: keystonemiddleware
   Importance: Undecided => High

** Changed in: oslo.cache
   Importance: Undecided => High

** Changed in: keystone
     Assignee: (unassigned) => Morgan Fainberg (mdrnstm)

** Changed in: keystonemiddleware
     Assignee: (unassigned) => Morgan Fainberg (mdrnstm)

** Changed in: oslo.cache
     Assignee: (unassigned) => Morgan Fainberg (mdrnstm)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1819957

Title:
  Caching with stale data when a server disconnects due to network
  partition and reconnects

Status in OpenStack Identity (keystone):
  New
Status in keystonemiddleware:
  New
Status in oslo.cache:
  New

Bug description:
  The flush_on_reconnect optional flag is not used. This can cause stale
  data to be utilized from a cache server that disconnected due to a
  network partition. This has security concerns as follows:

  1*  Password changes/user changes may be reverted for the cache TTL
  1a* User may get locked out if PCI-DSS is on and the password change happens during the network
      partition.
  2*  Grant changes may be reverted for the cache TTL
  3*  Resources (all types) may become "undeleted" for the cache TTL
  4*  Tokens (KSM) may become valid again during the cache TTL

  
  As noted in the python-memcached library:

      @param flush_on_reconnect: optional flag which prevents a
              scenario that can cause stale data to be read: If there's more
              than one memcached server and the connection to one is
              interrupted, keys that mapped to that server will get
              reassigned to another. If the first server comes back, those
              keys will map to it again. If it still has its data, get()s
              can read stale data that was overwritten on another
              server. This flag is off by default for backwards
              compatibility.

  The solution is to explicitly pass flush_on_reconnect as an optional
  argument. A concern with this model is that the memcached servers may
  be utilized by other tooling and may lose cache state (in the case the
  oslo.cache connection is the only thing affected by the network
  partitioning).

  This similarly needs to be addressed in pymemcache when it is utilized
  in lieu of python-memcached.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1819957/+subscriptions


Follow ups