graphite-dev team mailing list archive
-
graphite-dev team
-
Mailing list archive
-
Message #01422
Re: [Question #170794]: carbon-cache.py at its limit?
Question #170794 on Graphite changed:
https://answers.launchpad.net/graphite/+question/170794
Kevin Blackham posted a new comment:
Comment from the peanut gallery...
According to your iostat output, you are maxing out your disk subsystem,
but you know that. When carbon-cache can't keep up, those metrics start
piling quick. Behavior I have seen in the past isn't as extreme as
yours, but the pattern is similar. My data retention defs were 5 minute
resolution for two weeks (exact long term defs I don't have anymore as I
left that job recently), and was able to commit about 50,000
metrics/minute before I had to federate and distribute the load amongst
two machines with lots of disk each.
I've noticed as the cache size grows, carbon-cache can spend more and
more time sorting the data. I've seen it consume a lot of CPU and
memory doing this. Even with the kind of excessive hardware I've thrown
at it, there were times where it got into a condition where it would
never catch up and I'd have to restart it, dumping two or three hours of
cache.
I would conclude you simply need faster disk, fewer metrics per minute,
or different retention periods to reduce the size of the .wsp files and
therefore the I/O requirements for updates. What kind of disk subsystem
do you have? How many metrics per minute are you pushing into it? How
about write-back caching on your raid controller (with a battery
backup)? Is something like auto-verify enabled that triggers slow I/O
performance weekly?
--
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.