← Back to team overview

graphite-dev team mailing list archive

[Question #172630]: Can't seem to be able to tune carbon for AMQP

 

New question #172630 on Graphite:
https://answers.launchpad.net/graphite/+question/172630

Hey all,

We've been trying out different solutions for getting data from collectd to graphite and looks like we have hit a wall with hooking AMQP into carbon. No matter what I try I can't seem to get carbon to consume nearly as fast as we produce metrics. I've tried various combos of both small and large caches with both small and large writes_per_second and creates_per_minute but no matter what we do the message queue fills up a lot faster than carbon can consume.

Right now we're seeing about 34920 messages backlogged per minute in our message queue (582 per second).

The carbon has four 2 TB FC LUNs (it's 3PAR so it's a lot of 15k FC drives backing it, but the files are spread out over the entire drive cluster) striped on the host using LVM (for a total of ~8 TB) and regardless of what combo of cache settings we put in, we only see about 0.7% IO wait on the carbon host and LVM is set up to spread the load across all 4 LUNs, so I don't think that's the issue. From what I've read the sweet spot should be around ~50% IO wait for the disk.

I've also tried this going to local disks (15k SAS in RAID1) and we get the same results.

We're using ext4 with the deadline scheduler on RHEL6. The carbon host itself is a Dell M610, 8 core : Xeon(R) CPU X5570 @ 2.93GHz, and 48GB of RAM. While carbon is off the IO utilization is 0%

Our rabbitmq cluster is 2 hosts with the same specs sans the FC LUNs.

here is what we are using to get collectd data => message queue: https://github.com/poblahblahblah/collectd-http-carbon

here is our storage-schemas.conf file: https://gist.github.com/e88bc325926940d300d6

here is our carbon.conf file: https://gist.github.com/be63c1beae01b067600d

here is a bonnie++ run: https://gist.github.com/001eee920613aa30b42a

If we kill the unicorn process of sending the processed metrics to rabbitmq carbon eventually catches up and the graphs are updated as expected. Are we doing too many updates or do we just need to look into some kind of intelligent way to split the data up amongst multiple servers with different patterns per carbon-cache process?

Let me know if there is any other data which would help out. I am sure I am just doing something dumb with carbon.


-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.