graphite-dev team mailing list archive

Thread
Date

[Question #631136]: I/O utlization goes up when having same data on backends

To: graphite-dev@xxxxxxxxxxxxxxxxxxx
From: Tobias <question631136@xxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 04 May 2017 12:12:59 -0000
Reply-to: question631136@xxxxxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

New question #631136 on Graphite:
https://answers.launchpad.net/graphite/+question/631136

I have the following setup:

Physical loadbalancer round robin - > 2 * graphite-web frontends with same configuration, CLUSTER_SERVERS = ["10.57.72.33:80", "10.57.72.34:80"], pointing to two graphite backend servers. Both backend servers have the same data, in order to have redundancy.

The problem I see is that we get a lot of I/O utilization when having this configuration. I also see the following line in the exceptions.log:

Failed to join remote_fetch thread 10.57.72.33:80 within 6s
Failed to join remote_fetch thread 10.57.72.34:80 within 6s

If I remove one server from the CLUSTER_SERVERS everything seems to work very well. 

I am running graphite-web version 0.9.15


Here is the complete conf for my frontend:

SECRET_KEY = '?=exBKb/9J~m4B3re@P2Waa,`"H_e"x~'
TIME_ZONE = 'CET'
MEMCACHE_HOSTS = ['10.57.72.31:11211']
DEFAULT_CACHE_DURATION = 600 # Cache images and data for 10 minutes
STORAGE_DIR = '/var/opt/graphite/storage'
LOG_DIR = '/opt/graphite/storage/log/webapp'
CLUSTER_SERVERS = ["10.57.72.33:80", "10.57.72.34:80"]
CARBONLINK_HOSTS = []


Here is the conf for the backends:

SECRET_KEY = '?=exBKb/9J~m4B3re@P2Waa,`"H_e"x~'
TIME_ZONE = 'CET'
WHISPER_DIR = '/var/opt/graphite/storage/whisper'
CARBONLINK_HOSTS = ["127.0.0.1:7102:w1", "127.0.0.1:7103:w2", "127.0.0.1:7104:w3", "127.0.0.1:7105:w4", "127.0.0.1:7106:w5", "127.0.0.1:7107:w6"]
CARBONLINK_QUERY_BULK = True



Does anyone have any idea what could be causing this? Seems to be a configuration issue to me. 

-- 
You received this question notification because your team graphite-dev
is an answer contact for Graphite.