graphite-dev team mailing list archive
-
graphite-dev team
-
Mailing list archive
-
Message #06017
[Question #286655]: Performance issue for 1 of 2 carbon-cache daemons
New question #286655 on Graphite:
https://answers.launchpad.net/graphite/+question/286655
I've built a new Graphite/Grafana server for a project I'm working on at work.
It's a reasonably spec'd Virtual Machine with 4x vCPUs and is hooked up to a super-fast AllFlashArray via our Corporate vSphere.
However, I'm having a performance problem with my two carbon-cache daemons as per the following:
- I've got a number of high-performance production servers firing metrics directly into port 2004 on my Graphite server within the same VLAN.
- This is totaling around 285,000 metrics per minute
I was running 1 carbon-cache (and no relay), but my carbon dashboard on Grafana was indicating that I was hitting:
Cache.Size = 1 Million
Cache.Queue = 260,000
So, I've put a carbon-relay in front, and setup two carbon-cache daemons to help with the load, and now I'm seeing this:
http://picpaste.com/carbon_dashboard-BzvdqOJ8.PNG
As you can hopefully see from this picture:
- Carbon Relay (third row) is receiving the ~280K metrics and passing them to Cache A and Cache B at roughly 50/50
- Carbon Cache B (the new one) is receiving ~140K metrics, and committing ~140K metrics, and updating ~140K metrics every minute. It's also using around 45-50% of 1 CPU
- Carbon Cache A (the original one) is receiving ~135K metrics, committing ~135K metrics, but only updating ~20-25K metrics every minute. It's also using more CPU then Cache B, at around 55-65% CPU, yet is processing less metrics and failing to update alot less metrics as quickly.
As a result, Cache A now has a cache.size of around 400K and a cache.queue of around 130K - approx half of what it was before.
What on earth is going on? How can Carbon Cache B be processing and storing/updating it's ~50% of the metrics instantly with no cache at all, yet Carbon Cache A is struggling? I'm seeing delays in metrics being rendered and I can only assume it's because they are stuck in the cache for Carbon Cache A.
I also don't understand how, if there is a deficit of ~110K for Carbon Cache A's metricsReceived vs updateOperations how the cache isn't growing by the same amount every minute, yet as you can see it's staying constant at around 130K
Here is my carbon.conf:
http://pastebin.com/5CrKNKzu
Would really appreciate anyone's time/advice on this so I can resolve the performance issues with Carbon Cache A.
--
You received this question notification because your team graphite-dev
is an answer contact for Graphite.