← Back to team overview

graphite-dev team mailing list archive

[Question #180787]: Ceres - aggregation and compression methods

 

New question #180787 on Graphite:
https://answers.launchpad.net/graphite/+question/180787

Once the Ceres database is available. One key element that differentiates the most cutting edge professional storage mechanisms is the aggregation and compression method.

Ceres has the possibility to implement more efficient algorithms for storing time series data. The use of fan interpolators or Straight Line Interpolative Methods to store only the relevant data points in a time series. This can achieve compression ratios of 10x versus raw data while also staying accurate to within a defined maximum deviation. This is particularly interesting for industrial process data or trends that typical aggregation methods like min/max/average render too coarse.

Sure average aggregation has a very high compression rati, but for more accuracy administrators typically add min and max aggregation. Thus ending up with 3 data points + 1 time value in RRDTool or 3 data points + 3 time values in Whisper for a  approx 20x compression ratio with high accuracy loss. Whereas, a little upfront processing can make 1 data point + 1 time value much more accurate AND achieve a 20-35x compression versus the raw data. All data points between two stored values can be interpolated to be a straight line between the two data points. This eliminates both time and data values to interpolation.

Google for DATA COMPRESSION FOR PROCESS HISTORIANS by Peter A. James. for a comparison of various algorithms.

This would be a great implementation mini project for anyone with a little statistical background and a bit of python knowledge!

-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.