graphite-dev team mailing list archive
-
graphite-dev team
-
Mailing list archive
-
Message #03232
Re: [Question #210952]: Multiple Carbon caches on one host
Question #210952 on Graphite changed:
https://answers.launchpad.net/graphite/+question/210952
Description changed to:
We are upgrading to 0.9.10. We are trying to take advantage of having
multiple caches running on that box. The behavior we are seeing is that
one carbon cache seems semi-stable and the other the cache grows until
it tops out. I assumed that one of the cache's was using all the disk
IO and starving the other from being able to write. There is also a
relay on the box that uses rules to target each cache so there won't be
any contention for the same files. To try to fix the issue I divided
the MAX_UPDATES_PER_SECOND and MAX_CREATES_PER_MINUTE by the number of
caches we are trying to run. They were 250 and 60 respectively.
As you can see from these graphs we have approximately the same amount
going to the 2 caches but for some reason the 'a' cache just seems to
grow.
[IMG]http://i.imgur.com/1c5SA.png[/IMG]
[IMG]http://i.imgur.com/xfa8E.png[/IMG]
Here are the settings we currently have for the cache.
[cache:a]
LINE_RECEIVER_PORT = 2003
PICKLE_RECEIVER_PORT = 2004
MAX_UPDATES_PER_SECOND = 125
MAX_CACHE_SIZE = 18000000
MAX_CREATES_PER_MINUTE = 30
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7002
WHISPER_LOCK_WRITES = True
[cache:b]
LINE_RECEIVER_PORT = 2103
PICKLE_RECEIVER_PORT = 2104
MAX_UPDATES_PER_SECOND = 125
MAX_CACHE_SIZE = 18000000
MAX_CREATES_PER_MINUTE = 30
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7102
WHISPER_LOCK_WRITES = True
Edit: I should also mention this box is running the same hardware and configuration that other graphite boxes in our system have. The other boxes are handling around 325k metrics per minute. However, the metrics sent to this box are "special" in that they fluctuate as they are per vhost metrics. This means that metrics will be sent when a vhost is hit but not otherwise. This creates a situation where if a vhost gets one hit in the middle of the night and not again for 12 hours it will be put in the cache. So the metrics received is around 100-200k but there COULD be more in the cache. I had handled this in the past by implementing custom cache code to pop metrics out of the cache if they were sparse. This worked for keeping the cache stable but doesn't accurately reflect traffic for low volume vhosts. I hope I explained that correctly. Thanks for any advice.
--
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.