graphite-dev team mailing list archive

Thread
Date
Re: [Question #110431]: Carbon Docs out of sync...

To: graphite-dev@xxxxxxxxxxxxxxxxxxx
From: chrismd <question110431@xxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 14 May 2010 04:22:31 -0000
Reply-to: question110431@xxxxxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx
Question #110431 on Graphite changed:
https://answers.launchpad.net/graphite/+question/110431

    Status: Open => Answered

chrismd proposed the following answer:
You're quite right Sam, the docs are very much lacking. I'd be glad to
answer any specific questions you have though about how carbon works.
There is some good information in this questions forum too.

Carbon used to be 3 daemons, but it was rewritten as a single daemon
using the Twisted framework. This unified carbon daemon is carbon-
cache.py. Its job is to listen for datapoints from clients and to write
them to disk as fast as possible using the whisper database library.
When the local disk cannot keep up with the incoming data, carbon
accrues a cache of data. The larger this cache gets the more efficient
carbon becomes (more datapoints get compacted into each write() system
call, each write() call is typically the same cost regardless of size
since it always fits in a single page). Essentially the cache grows
until carbon is efficient enough (the rate at which it writes datapoints
to disk becomes high enough) to keep up with the rate of data you send
it. This can be quite large, one of my systems sustains a cache size of
approximately 2 million datapoints, equivalent to about 7 minutes worth
of data for about 300,000 metrics. Whenever a substantial external I/O
operation occurs on the system (like doing a backup) the kernel's I/O
cache gets polluted with data other than carbon's and thus its
throughput can slow down dramatically. When this happens carbon's cache
simply grows until it matches the rate necessary to bring the system
back into equilibrium. Just a few weeks ago my system had such an issue
and the cache grew to over 10 million datapoints (30 minute backlog of
data) and then recovered without any noticeable impact.

Another important aspect of carbon's functionality is that when the
graphite webapp gets a request to render a graph, it fetches datapoints
off the local disk using the whisper library *and* it queries carbon-
cache over a TCP socket to retrieve any cached datapoints. It then
combines the results to give a graph with real-time data even when there
is a perpetual backlog of several minutes of data in the cache.

The carbon-relay.py program is only useful when you've got a cluster of
graphite servers, and even then it is optional. It's job is to listen
for datapoints the same way carbon-cache does but then it simply relays
the datapoints along to one or more carbon-cache processes in your
cluster based on a set of rules you give it. It's basically an
application-level load balancer. It can also be used to duplicate
datapoints to more than one carbon-cache server for redundant storage of
the data.

That's the 2 minute tour of carbon, let me know if you have any other
questions. I'd be glad to help.

-Chris

-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.