graphite-dev team mailing list archive

Thread
Date
Re: [Question #178969]: Tuning Graphite for 3M points/minute with a single backend machine (a story)

To: graphite-dev@xxxxxxxxxxxxxxxxxxx
From: chrismd <question178969@xxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 03 Dec 2011 10:01:00 -0000
Reply-to: question178969@xxxxxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx
Question #178969 on Graphite changed:
https://answers.launchpad.net/graphite/+question/178969

chrismd proposed the following answer:
Sorry for the delayed response, I have been on hiatus for the past 2
weeks and just read your write up.

First off, thanks for being so thorough and detailed. Second off, you
officially win the trophy for Biggest Graphite System Ever (would make
for a good t-shirt I think), 3M metrics/min on one machine is very
impressive. Third off, I think there are some ways we can better utilize
your vast system resources so I'm psyched to see how far you might be
able to push this system if you were interested in doing a benchmark
once we've optimized everything :).

So down to the details. Your observation that rapid small writes can
hamper performance is quite correct, and that is exactly the motivation
behind the MAX_UPDATES_PER_SECOND setting in carbon.conf. It's default
value (1,000) is too high, I swear I already fixed this by lowering it
to 500 but I'm looking at trunk and it's 1,000. Sorry about that, just
committed it at 500 now. Either way when you've got N carbon-caches you
need to divide the total value you're after by N. A system with as many
disks as yours probably can handily do 1,000 updates/sec but 10,000/sec
would certainly be excessive. This approach should result in a constant
rate of write operations where the number of datapoints written to disk
is proportional to this rate and the cache size. This also is a good way
to limit how hard the backend will work the disks (in terms of seeks,
the writes are negligibly small) to leave a relatively fixed amount of
disk utilization headroom for the frontend requests or other processes
on the system. If there's contention, the cache simply grows. As long as
it doesn't max out there is generally no visible impact.

Some good news is you might be able to do away with all your relays. The
next release, 0.9.10, is going to be based on this branch:
https://code.launchpad.net/~chrismd/+junk/graphite-megacarbon. The
'megacarbon' part refers to the fact that all of carbon's functionality
has been unified in a single carbon-daemon.py that has a configurable
processing pipeline, so any instance can aggregate, relay, rename, cache
& write, combinations thereof, etc. I'll suggest some ideas in a moment
how you could leverage this.

The carbon-relay daemon (or equivalently, a carbon-daemon instance that
just relays) has been somewhat obsoleted by the new carbon-client.py
script that is basically a client-side relay. It has all of the same
functionality as the standard relay (uses same carbon libraries), the
only difference is that it reads metrics from its stdin so it's suited
for client-side use. If you don't want to deploy twisted & carbon to all
1,000 client machines that's understandable, and that's a case where
you'd still want a relaying daemon, it centralizes configuration &
maintenance burden as well as processing load (you win some, you lose
some).

If you use carbon-clients you can still separate the 5 metrics that need
to get aggregated by using a second carbon-client. Use one for
aggregated metrics that connects to the aggregator(s), and one for non-
aggregated metrics that connects directly to the carbon-caches / carbon-
daemons that write. The aggregator daemons can forward the 5 original
metrics  on to the writers. Technically the two separate carbon-clients
(which is basically the same idea as your top/middle relay split) would
be unnecessary if the relay code could mix the use of relay rules and
consistent hashing but currently that isn't supported. Come to think of
it, it wouldn't be that hard to implement (just add an option to the
relay-rule configurations to use consistent hashing to determine
destinations from the rule's destinations rather than a global
destination list). That's a really good idea actually, thanks for
pointing that out :), Bug #899543. Implementing that would remove the
need for your mid-tier relays without needing to change anything else.

If you want to try out the new carbon-daemon.py feel free, it is already
working and tested on the megacarbon branch. There are just a few webapp
changes in that branch that are mid-flight before I merge it all to
trunk so don't use the webapp code from that branch yet.

Given the constraints of carbon 0.9.9 though, you solved the problem
quite well, the only tweak I can think of that involves no graphite code
changes would be to have your clients separately send the 5 aggregated
metrics directly to the aggregator and all their other metrics to the
haproxy, that would eliminate the need for the mid-tier relays.

Thanks again for the excellent write-up.

-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.