← Back to team overview

graphite-dev team mailing list archive

Re: Extending the webapp to gather data from remote hosts

 

Kraig, adding the ability to "federate" graphite has been on my todo
list for a while and I am hoping to get time to implement this some
time this summer. If you want to take a crack at it before then though
that is great. Just to clarify, by federate I mean to have the ability
to have N separate graphite installations on separate servers,
separate storage, and have them share data in such a way that any
webapp can serve data contained on any of the other servers. It will
all appear to the user to be one big graphite server. Each server will
have it's own carbon daemons running receiving and storing data as
well as a webapp to provide the access to that data. You should not
need to modify any of the carbon daemons to achieve this, it just
requires a couple of modifications to the webapp. The webapp currently
provides all of the APIs to the graphite data.

As for your I/O problems, are you using RAID? Graphite's use of
whisper is designed for very fast read performance at the cost of poor
write performance (because the data is scattered all over the disk,
organized for efficient reading). This will cause your disk to seek a
lot and that is usually the first I/O wall you will hit. RAID seems to
be the best remedy for this issue as the more disks you have the less
seeking is necessary plus the load gets shared. If you already have a
RAID setup then the next most likely cause of your bottleneck is
actually memory. How much RAM does your graphite server have? The
reason memory is important is because every write() call (20K+ times
per minute in your case) puts data into a block in the kernel's I/O
cache. The kernel continually writes whatever is in it's I/O
write-cache to disk as fast as it can. So think of water (your
incoming metrics) filling up a tank (the I/O cache) which is being
drained (the kernel physically writing to disk). The size of the tank
(your RAM) is important because carbon has an optimization where it
writes larger chunks of data at a time as the cache grows. The more
RAM you have, the larger the cache can get, which increases the rate
at carbon-persister can write data points to disk (by packing more
related points into a single write operation).

At Orbitz, I had 2 servers (Sun T2000's with 12G of RAM each) sharing
a large RAID5 array (VxFS filesystem w/clustering) and the system
could sustain a volume of 180,000 metrics per minute. Even if the
cache grows very large, say containing all of the past 30 minutes
worth of data, your graphs will still be real-time because the webapp
queries carbon-cache for whatever data has not yet been written to
disk and recombines it with whatever was stored on disk.

If you are interested in working on the webapp federation feature let
me know, I'd be happy to help you out.

-Chris

On Fri, May 22, 2009 at 2:16 PM, Kraig Amador <kamador@xxxxxxxxxxxxx> wrote:
> I'm currently collecting data on 20k+ points per minute on two machines and
> have hit an IO wall. I'd like to move the data out of whisper but in order
> to make use of this the webapp is going to need to be able to discover data
> trees over a remote connection.
>
> It seems like the best way to do this would be to extend carbon-cache to
> also describe the sources it contains.  I wanted to reach out before I
> started implementing a protocol to get this information from carbon-cache.
> Ideally I will replace carbon-cache with a java app that will handle the
> storage and data serving.
>
> In question 59255 (https://answers.launchpad.net/graphite/+question/59255) I
> see that some thought has already gone into this. Does anyone have any
> suggestions?
>
> _______________________________________________
> Mailing list: https://launchpad.net/~graphite-dev
> Post to     : graphite-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~graphite-dev
> More help   : https://help.launchpad.net/ListHelp
>



Follow ups

References