graphite-dev team mailing list archive

Thread
Date

Re: [Question #170794]: carbon-cache.py at its limit?

To: graphite-dev@xxxxxxxxxxxxxxxxxxx
From: Kevin Blackham <question170794@xxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 16 Sep 2011 07:35:48 -0000
Reply-to: question170794@xxxxxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

Question #170794 on Graphite changed:
https://answers.launchpad.net/graphite/+question/170794

Kevin Blackham posted a new comment:
Comment from the peanut gallery...

According to your iostat output, you are maxing out your disk subsystem,
but you know that.  When carbon-cache can't keep up, those metrics start
piling quick.  Behavior I have seen in the past isn't as extreme as
yours, but the pattern is similar.  My data retention defs were 5 minute
resolution for two weeks (exact long term defs I don't have anymore as I
left that job recently), and was able to commit about 50,000
metrics/minute before I had to federate and distribute the load amongst
two machines with lots of disk each.

I've noticed as the cache size grows, carbon-cache can spend more and
more time sorting the data.  I've seen it consume a lot of CPU and
memory doing this.  Even with the kind of excessive hardware I've thrown
at it, there were times where it got into a condition where it would
never catch up and I'd have to restart it, dumping two or three hours of
cache.

I would conclude you simply need faster disk, fewer metrics per minute,
or different retention periods to reduce the size of the .wsp files and
therefore the I/O requirements for updates.  What kind of disk subsystem
do you have?  How many metrics per minute are you pushing into it?  How
about write-back caching on your raid controller (with a battery
backup)?  Is something like auto-verify enabled that triggers slow I/O
performance weekly?

-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.