← Back to team overview

graphite-dev team mailing list archive

Re: [Question #198435]: Scaling to very large number of slowly updating metrics

 

Question #198435 on Graphite changed:
https://answers.launchpad.net/graphite/+question/198435

    Status: Answered => Solved

Thomas V confirmed that the question is solved:
Thanks Michael, that gives me a good idea. Our metrics are well
structured already and there is a certain number of objects we need to
keep our eye on given what you describe. That number drives the max
number of metrics at a single level and currently stands at 5,000. This
number is also the one that will grow the fastest. I will probably add
another intermediate level right now to cut down on the max number of
files per directory in the future.

Also, we probably won't have to render graphs with very large numbers of
metrics, or even aggregate them via carbon, since we have aggregate
metrics coming from our system. Our system is different from a server
farm where each server likely directly reports into graphite.

However, we do plan to monitor these metrics (using rawData=true, likely
from Nagios) and send alerts when any of them spike. We hope to include
a hyperlink in the alert pointing to the graph showing the spiking
metric and a small number of related aggregate metrics to provide
context.

This means Nagios would have to poll millions of metrics via
rawData=true on a regular basis, say every hour. What are the
bottlenecks in graphite when handling this part of our load?

Since our metrics is highly suitable for sharding, I know we can always
scale horizontally, all the way out to Nagios if needed. Just trying to
figure our just how much hardware we may may need :)

I'll report back as we gain more experience. We have a prototype
graphite installation running that will start receiving production
metrics shortly.

Thanks,
Thomas

-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.