← Back to team overview

graphite-dev team mailing list archive

Re: [Question #180787]: Ceres - aggregation and compression methods

 

Question #180787 on Graphite changed:
https://answers.launchpad.net/graphite/+question/180787

Description changed to:
Once the Ceres database is available. One key element that
differentiates the most cutting edge professional storage mechanisms is
the aggregation and compression method.

Ceres has the possibility to implement more efficient algorithms for
storing time series data. The use of fan interpolators or Straight Line
Interpolative Methods to store only the relevant data points in a time
series. This can achieve compression ratios of 10x versus raw data while
also staying accurate to within a defined maximum deviation. This is
particularly interesting for industrial process data or trends that
typical aggregation methods like min/max/average render too coarse.

Average aggregation have a high compression ratio, but for more accuracy
administrators also  typically add min and max aggregation. Which is
still by no means an accurate representation of the time series. Thus
ending up with 3 data points + 1 time value in RRDTool or 3 data points
+ 3 time values in Whisper for a  approx 20x compression ratio with high
accuracy loss. Whereas, a little upfront processing can make 1 data
point + 1 time value much more accurate AND still achieve a 20-35x
compression versus the raw data. All data points between two stored
values can be interpolated to be a straight line between the two data
points. This eliminates both time and data values to interpolation.

Google for DATA COMPRESSION FOR PROCESS HISTORIANS by Peter A. James.
for a comparison of various algorithms.

This would be a great implementation mini project for anyone with a
little statistical background and a bit of python knowledge!

-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.