yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #20585
[Bug 1332058] Re: keystone behavior when one memcache backend is down
Fix proposed to branch: master
Review: https://review.openstack.org/119452
** Changed in: keystone
Status: Invalid => In Progress
** Changed in: keystone
Assignee: (unassigned) => Yuriy Taraday (yorik-sar)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Keystone.
https://bugs.launchpad.net/bugs/1332058
Title:
keystone behavior when one memcache backend is down
Status in OpenStack Identity (Keystone):
In Progress
Status in Mirantis OpenStack:
Fix Committed
Bug description:
Hi,
Our implementation uses dogpile.cache.memcached as a backend for
tokens. Recently, I have found interesting behavior when one of
memcache regions went down. There is a 3-6 second delay when I try to
get a token. If I have 2 backends then I have 6-12 seconds delay. It's
very easy to test
Test connection using
for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep
real | awk '{print $2}'; done
Block one memcache backend using
iptables -I INPUT -p tcp --dport 11211 -j DROP (Simulation power
outage of node)
Test the speed using
for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep
real | awk '{print $2}'; done
Also I straced keystone process with
strace -tt -s 512 -o /root/log1 -f -p PID
and got
26872 connect(9, {sa_family=AF_INET, sin_port=htons(11211),
sin_addr=inet_addr("10.108.2.3")}, 16) = -1 EINPROGRESS (Operation now
in progress)
though this IP is down
Also I checked the code
https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L210-L237
https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L285-L289
https://github.com/openstack/keystone/blob/master/keystone/common/kvs/backends/memcached.py#L96
and was not able to find any piece of details how keystone treats with
backend when it's down
There should be a logic which temporarily blocks backend when it's not
accessible. After timeout period, backend should be probed (but not
blocking get/set operations of current backends) and if connection is
successful it should be added back to operation. Here is a sample how
it could be implemented
http://dogpilecache.readthedocs.org/en/latest/usage.html#changing-
backend-behavior
To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1332058/+subscriptions
References