yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1332058] Re: keystone behavior when one memcache backend is down

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1332058@xxxxxxxxxxxxxxxxxx>
Date: Fri, 05 Sep 2014 19:36:36 -0000
Reply-to: Bug 1332058 <1332058@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Fix proposed to branch: master
Review: https://review.openstack.org/119452

** Changed in: keystone
       Status: Invalid => In Progress

** Changed in: keystone
     Assignee: (unassigned) => Yuriy Taraday (yorik-sar)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Keystone.
https://bugs.launchpad.net/bugs/1332058

Title:
  keystone behavior when one memcache backend is down

Status in OpenStack Identity (Keystone):
  In Progress
Status in Mirantis OpenStack:
  Fix Committed

Bug description:
  Hi,

  Our implementation uses dogpile.cache.memcached as a backend for
  tokens. Recently, I have found interesting behavior when one of
  memcache regions went down. There is a 3-6 second delay when I try to
  get a token. If I have 2 backends then I have 6-12 seconds delay. It's
  very easy to test

  Test connection using

  for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep
  real | awk '{print $2}'; done

  Block one memcache backend using

  iptables -I INPUT -p tcp --dport 11211 -j DROP  (Simulation power
  outage of node)

  Test the speed using

  for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep
  real | awk '{print $2}'; done

  Also I straced keystone process with

  strace -tt -s 512 -o /root/log1 -f -p PID

  and got

  26872 connect(9, {sa_family=AF_INET, sin_port=htons(11211),
  sin_addr=inet_addr("10.108.2.3")}, 16) = -1 EINPROGRESS (Operation now
  in progress)

  though this IP is down

  Also I checked the code

  https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L210-L237
  https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L285-L289
   https://github.com/openstack/keystone/blob/master/keystone/common/kvs/backends/memcached.py#L96

  and was not able to find any piece of details how keystone treats with
  backend when it's down

  There should be a logic which temporarily blocks backend when it's not
  accessible. After timeout period, backend should be probed (but not
  blocking get/set operations of current backends) and if connection is
  successful it should be added back to operation. Here is a sample how
  it could be implemented

  http://dogpilecache.readthedocs.org/en/latest/usage.html#changing-
  backend-behavior

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1332058/+subscriptions

References

[Bug 1332058] [NEW] keystone behavior when one memcache backend is down
From: Sergii Golovatiuk, 2014-06-19