← Back to team overview

graphite-dev team mailing list archive

Re: [Question #197604]: process for scaling horizontally

 

Question #197604 on Graphite changed:
https://answers.launchpad.net/graphite/+question/197604

    Status: Open => Answered

Michael Leinartas proposed the following answer:
Generally the first step is to ensure you're utilizing your first host
fully. Python is only able to use a single CPU so it's common for the
carbon-cache instance to be bottlenecked using 100% of one CPU. In this
case you can configure a 2nd (or more) carbon-cache instance and
distribute load to it via carbon-relay in consistent-hash mode which
will ensure they're all loaded equally but don't share processing of the
same metrics (which would cause the same files to be updated by multiple
processes).

To go to multiple boxes you'll do the same thing but with carbon-cache
instances on separate hosts. Each host will also need to have an
instance of the webapp running to serve up the locally-stored metrics.
The difficult part of getting this set up is dealing with existing
metrics - after cutting over to a carbon-relay in consistent-hash mode
serving to two hosts, half of your metrics will start sending to the new
host when the historical data is still on the first. In this case you
might find it simpler to use the relay-rules method to shard the data
yourself based on metric names.

On the webapp side the way the setup works is that each webapp will have
the other webapps configured in a CLUSTER_SERVERS setting. While
browsing or rendering metrics, each server will be queried for the data
and the first server found to have the metric will be used. This makes
it important to ensure the metrics are completely separated - the same
metric cannot live on multiple machines.

Unfortunately, this stuff isn't very well documented outside of the config files:
https://github.com/graphite-project/graphite-web/blob/master/webapp/graphite/local_settings.py.example#L161-194
https://github.com/graphite-project/carbon/blob/master/conf/carbon.conf.example#L173-199

-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.