← Back to team overview

graphite-dev team mailing list archive

Re: [Question #285063]: Twisted MemoryError / MetricCache is full

 

Question #285063 on Graphite changed:
https://answers.launchpad.net/graphite/+question/285063

Will gave more information on the question:
Ok, I've made the following adjustments:

=====

/opt/graphite/bin/carbon.conf:
MAX_CACHE_SIZE=10000000
MAX_UPDATES_PER_SECOND=50000

=====

10MM metrics/minute divided by 60 seconds divided by 8 instances is
about 21000 metrics per instance per second, so 50000 should be more
than able.

=====

/opt/graphite/bin/ccrelay.conf:

cluster lga
    fnv1a_ch
	0.0.0.0:2013=a
	0.0.0.0:2113=b
	0.0.0.0:2213=c
	0.0.0.0:2313=d
	0.0.0.0:2413=e
	0.0.0.0:2513=f
	0.0.0.0:2613=g
	0.0.0.0:2713=h
;

match *
	send to lga
;

=====

ps out:

root      4996 77.9  7.1 12455596 9446004 ?    Ssl  20:58  26:42
/opt/graphite/bin/relay -f /opt/graphite/bin/ccrelay.conf -l
/opt/graphite/storage/log/ccrelay/ccrelay.log -S 1 -D -P
/var/run/ccrelay.pid -q 150000000 -b 200000

=====

Graphs:

Graphite Stats: https://imgur.com/n28Q2Z5
Carbon-C-Relay Stats: https://imgur.com/ObHAum6

Looks like we could actually pare down the number of threads that
carbon-c-relay runs but it otherwise seems to be handling the load quite
well. However, I have some concerns at this point:

1) Committed points is always < Metrics received in Graphite stats.

2) The carbon-c-relay logfile occasionally shows this for a random
instance:

(ERR) failed to write() to 10.201.12.199:2013: uncomplete write

The cache size on that instance is nearing MAX_CACHE_SIZE within 15
minutes, and the RAM usage on that instance is significantly higher than
the others. This message goes away after I kill and restart the proc.
Not sure what to do here but we've caused the cache sizes to tap out
faster than usual.

-- 
You received this question notification because your team graphite-dev
is an answer contact for Graphite.