← Back to team overview

graphite-dev team mailing list archive

Re: [Question #177524]: can graphie scale to Ks of nodes at frequencies of seconds

 

Question #177524 on Graphite changed:
https://answers.launchpad.net/graphite/+question/177524

Mark Seger posted a new comment:
Thanks for the timely response.  I'm definitely happy to hear it has a
distributed model which we all know is really the only way to scale.
Sounds like definitely worth taking a closer look at when I get some
more time.

As for the number of datapoints vs pixels, this is the EXACT problem I
have with rrd.  As you change the the time scale of the display, the
graphs themselves can actually change and I think that's a huge no-no.

The bottom line is by default, collectl generates 8640 data points/day
at a 10 second monitoring interval and I will typically look at a day's
worth of data to see if there are any spikes.  RRD, and it sounds like
graphite too, cannot fit all those points on a graphs and so averages
them, destroying the spikes.  For that reason my tool of choice has
always been gnuplot which is fast and plots everything you tell it to.
If a few points fall in the same interval, gnuplot simply plots all them
and you can clearly see the spikes.

I don't know if graphite has the ability to display multiple points at
the same time as it would clearly break the 'prettiness' of the plots,
but I don't care about pretty, I care about accuracy and I'm afraid this
would be a show stopper for me, at least as for using graphite as a
diagnostic tool.  Has anyone considered alternate plotting styles for
when this is a concern?  Does graphite actually build the plots itself
or does it rely on some other tool?  If the former, in theory it
shouldn't be that difficult to support an alternate style.

I realize most people probably don't get into that level of analysis of
the data, but that has always been the #1 consideration for me -
accurate data by which you can then do detailed analysis of most system
/cluster-wide problems.  I've always wondering how many people think
their systems are running just fine when in fact there are all kinds of
horrible things going on that they don't know about because they're
either monitoring at a granularity of a minute or more and/or are using
rrd and not seeing any real problems.

re <1 second intervals:  not really a big deal as 1 second is usually
more than sufficient for looking at data in real-time.

-mark

-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.