← Back to team overview

graphite-dev team mailing list archive

Re: [Question #286655]: Performance issue for 1 of 2 carbon-cache daemons

 

Question #286655 on Graphite changed:
https://answers.launchpad.net/graphite/+question/286655

Description changed to:
I've built a new Graphite/Grafana server for a project I'm working on at
work.

It's a reasonably spec'd Virtual Machine with 4x vCPUs and is hooked up
to a super-fast AllFlashArray via our Corporate vSphere.

However, I'm having a performance problem with my two carbon-cache
daemons as per the following:

- I've got a number of high-performance production servers firing metrics directly into port 2004 on my Graphite server within the same VLAN.
- This is totaling around 285,000 metrics per minute

I was running 1 carbon-cache (and no relay), but my carbon dashboard on Grafana was indicating that I was hitting:
Cache.Size = 1 Million
Cache.Queue = 260,000

So, I've put a carbon-relay in front, and setup two carbon-cache daemons
to help with the load, and now I'm seeing this:

http://picpaste.com/carbon_dashboard-BzvdqOJ8.PNG

As you can hopefully see from this picture:

- Carbon Relay (third row) is receiving the ~280K metrics and passing them to Cache A and Cache B at roughly 50/50
- Carbon Cache B (the new one) is receiving ~140K metrics, and committing ~140K metrics, and updating ~140K metrics every minute. It's also using around 45-50% of 1 CPU. All seems well with this carbon-cache daemon.
- Carbon Cache A (the original one) is receiving ~135K metrics, committing ~135K metrics, but only updating ~20-25K metrics every minute. It's also using more CPU then Cache B, at around 55-65% CPU, yet is processing less metrics and failing to update alot less metrics as quickly.

As a result, Cache A now has a cache.size of around 400K and a
cache.queue of around 130K - approx half of what it was before.

What on earth is going on? How can Carbon Cache B be processing and
storing/updating it's ~50% of the metrics instantly with no cache at
all, yet Carbon Cache A is struggling? I'm seeing delays in metrics
being rendered and I can only assume it's because they are stuck in the
cache for Carbon Cache A.

I also don't understand how, if there is a deficit of ~110K for Carbon
Cache A's metricsReceived vs updateOperations how the cache isn't
growing by the same amount every minute, yet as you can see it's staying
constant at around 130K


Here is my carbon.conf:

http://pastebin.com/5CrKNKzu

Would really appreciate anyone's time/advice on this so I can resolve
the performance issues with Carbon Cache A.

-- 
You received this question notification because your team graphite-dev
is an answer contact for Graphite.