← Back to team overview

graphite-dev team mailing list archive

Re: [Question #187874]: send queue full / cpu @ 100%

 

Question #187874 on Graphite changed:
https://answers.launchpad.net/graphite/+question/187874

    Status: Open => Answered

Michael Leinartas proposed the following answer:
Aggregator is more likely CPU bound since for both rewrites and
aggregation it's running multiple regex matches on each metric coming
in. You'll want to run multiple carbon-aggregators as a start. If you're
only using carbon-aggregator for rewrites (i.e. your aggregation-
rules.conf is empty) then you can run multiple carbon-aggregators
pointed to the same carbon-cache and spread load to them using carbon-
relay or haproxy.

If you *are* using aggregation the problem is more complex since the same metric name and timestamp will be sent from each aggregator and one will overwrite the other. For this, you have a few options:
* Place carbon-relay in front of the aggregators in relay-rules mode and shard the data by metric path
   You'll need to ensure that none of your aggregation rules combine metrics across the shard
* Send multiple aggregators to another aggregator with rules to aggregate the aggregated values ('from' and 'to' regex will be identical) - note that doing this requires trunk as 0.9.9 has a bug when 'from' and 'to' rules are identical

If you're using 0.9.9, I'd suggest applying this patch since you're
queuing on the client side and the queue draining behavior is somewhat
broken in 0.9.9: http://bazaar.launchpad.net/~graphite-
dev/graphite/main/revision/671

Also consider the fact that you're on EC2 - you might be seeing the
behavior triggered by a slowdown of your instance (other customers,
etc). The above patch and an increase in your MAX_QUEUE_SIZE may allow
you to ride out a slowdown

-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.