← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1332058] Re: keystone behavior when one memcache backend is down

 

** Changed in: keystonemiddleware
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Keystone.
https://bugs.launchpad.net/bugs/1332058

Title:
  keystone behavior when one memcache backend is down

Status in OpenStack Identity (Keystone):
  Fix Committed
Status in OpenStack Identity  (Keystone) Middleware:
  Fix Released
Status in Mirantis OpenStack:
  Fix Committed

Bug description:
  Hi,

  Our implementation uses dogpile.cache.memcached as a backend for
  tokens. Recently, I have found interesting behavior when one of
  memcache regions went down. There is a 3-6 second delay when I try to
  get a token. If I have 2 backends then I have 6-12 seconds delay. It's
  very easy to test

  Test connection using

  for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep
  real | awk '{print $2}'; done

  Block one memcache backend using

  iptables -I INPUT -p tcp --dport 11211 -j DROP  (Simulation power
  outage of node)

  Test the speed using

  for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep
  real | awk '{print $2}'; done

  Also I straced keystone process with

  strace -tt -s 512 -o /root/log1 -f -p PID

  and got

  26872 connect(9, {sa_family=AF_INET, sin_port=htons(11211),
  sin_addr=inet_addr("10.108.2.3")}, 16) = -1 EINPROGRESS (Operation now
  in progress)

  though this IP is down

  Also I checked the code

  https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L210-L237
  https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L285-L289
   https://github.com/openstack/keystone/blob/master/keystone/common/kvs/backends/memcached.py#L96

  and was not able to find any piece of details how keystone treats with
  backend when it's down

  There should be a logic which temporarily blocks backend when it's not
  accessible. After timeout period, backend should be probed (but not
  blocking get/set operations of current backends) and if connection is
  successful it should be added back to operation. Here is a sample how
  it could be implemented

  http://dogpilecache.readthedocs.org/en/latest/usage.html#changing-
  backend-behavior

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1332058/+subscriptions


References