graphite-dev team mailing list archive

Thread
Date

Re: [Question #214631]: Hardware specs for carbons

To: graphite-dev@xxxxxxxxxxxxxxxxxxx
From: Yee-Ting Li <question214631@xxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 26 Nov 2012 23:06:02 -0000
Reply-to: question214631@xxxxxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

Question #214631 on Graphite changed:
https://answers.launchpad.net/graphite/+question/214631

Yee-Ting Li posted a new comment:
Hi, just to provide some comparisons: i currently use a dell r610
(similar spec to your machines), but i use 4x1tb local disks behind a
raid 5. the performance is rather pitiful (mainly due to the sub-par
raid card - a MegaRAID SAS 1078).

looking at atop, i can get  around 200 writes/sec with about 50
reads/sec (at the same time). far lower than the 500.

currently i push around half a million datapoints into carbon per
minute. each datapoint is actually unique (ie half a million different
metrics are sent per minute). as you can imagine this is rather
aggressive - but it is stable. i achieve this by having 16 carbon
instances, and i do client side consistent hashing to the 16 separate
instances - hence each instance handles around 32,000 metrics per minute
on average. (using the relay peg's cpu too much and hence limits the
number of metrics it can receive)

i have set each instances MAX_CACHE_SIZE to around 10,000,000; however,
some instances on average only hover around 1,000,000 (you can spot
under carbon.agents.<agent>.cache.size). in theory, i would preferably
have this much lower (ie less than 24MB memory divided by number of
instances) if it weren't for my observations below.

i have also set the number of MAX_UPDATES_PER_SECOND to ~5 for each
instance (ie 5*16*60 =~ 5,000) which is lower than the max writes/sec
from atop, this is mainly so i do not starve the flush process writing
the (kernel) stuff from cache to disk.

the real issue i have is that the consistent hashing isn't very
balanced, so i end up with a couple of instances taking on ~50,000
(unique) metrics per minute. for some reason, this causes these specific
instances do far less updates/minute than the other instances; which
causes their caches to grow and eventually fill up to the 10,000,000.
i've noticed that instances appear to have issues when they reach over
~35,000 queues/minute. i deal with this problem by buffering at the
client side. i do use FLOW_CONTROL as a means to notify the specific
client instance that it should start buffering. but i'm not sure this is
completely wise.

my advice is to run as many instances of carbon-cache as possible to
keep your queue sizes low.

also, if you configure the consistent hashing right, the webapp should
hit the instance cache so you /should/ get the data immediately after
it's been feed into the carbon-cache.

-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.