graphite-dev team mailing list archive

Thread
Date

[Question #212209]: Scaling graphite on AWS

To: graphite-dev@xxxxxxxxxxxxxxxxxxx
From: Ben Whaley <question212209@xxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 24 Oct 2012 17:35:48 -0000
Reply-to: question212209@xxxxxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

New question #212209 on Graphite:
https://answers.launchpad.net/graphite/+question/212209

I have a rapidly-growing, evolving AWS deployment. My largest graphite cluster is currently one carbon-relay in front of six carbon-cache nodes using consistent hashing and memcached on each cache node. There are 450 EC2 instances sending data to the carbon-relay via Joe Miller's collectd-graphite plugin. Each cache node shows between 35k-50k metricsReceived/minute (according to the carbon/agents graphite data).  The total metrics received per minute is around 240k. 

It's clear from that data that it's I/O bound, which is no surprise since I/O on AWS instances is notoriously quite poor (unless you go with the pricey SSD instance). The data volumes are RAID0 of the two ephemeral disks on an m1.large. It's becoming painful to rebalance the data files when adding new instances. Three more instances  will be the same price as an SSD. FWIW, each cache node is doing around 600-700 IOPS.

What is the best way to scale this cluster? Should I bite the bullet and fork out cash for an SSD, or is there something else I can do that I haven't thought of? 




-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.