← Back to team overview

graphite-dev team mailing list archive

Re: [Question #170295]: how to handle high-volume data?

 

Question #170295 on Graphite changed:
https://answers.launchpad.net/graphite/+question/170295

    Status: Open => Answered

chrismd proposed the following answer:
There is an easy fix for #1, in the [cache] section of carbon.conf there
is a setting called MAX_UPDATES_PER_SECOND which I think is set too high
by default as it is causing your I/O wait issue. I'll make sure to lower
it in the next release. What this setting does is to rate-limit the
write operations performed by carbon. This may seem counter-intuitive as
you'd think faster would be better but the problem that creates is an
excessive number of non-sequential I/O requests which slows down
everything. Your disks just constantly seek to write a datapoint to
every single wsp file. Try a value of 500 (the unit is writes per
second) and see how that goes. Note that this does not mean carbon will
lag behind, it just means that it will rely more on carbon's caching and
bulk writing behavior. Ideally you want your disk to be busy enough you
don't have a huge cache eating up all your memory but low enough that
the disks aren't going nuts.

#2 is definitely doable.

For #3, that looks fine, using a single persistent connection instead of
spawning many short-lived connections is definitely the way to go. If
you find that carbon-cache is CPU-bound then you can probably reduce CPU
load by switching to the pickle protocol. For that just use port 2004
and do the following:

#assume data = [(metric, datapoint), (metric, datapoint), ...]
# where metric is a string metric name
# and datapoint is (timestamp, value) both floats
import struct
import cPickle
serialized_data = cPickle.dumps(data, protocol=-1)
header = struct.pack("!L", len(serialized_data))
my_socket.sendall(header + serialized_data)

Note that the performance gain here is somewhat dependent on having
reasonably large lists of datapoints not a bunch of small lists sent
separately. I'd suggest aiming for 500 or so datapoints per message.

-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.