← Back to team overview

graphite-dev team mailing list archive

Re: Cassandra backend

 

Excerpts from Chris Davis's message of Fri Mar 26 12:09:40 -0500 2010:
> Hi Elliot, currently whisper behaves like RRD does in that you must specify
> the size/retention of the file upon creation. This means it drops old data
> the same way RRD does. One thing you can do is use the whisper-resize script
> to extend your whisper files but obviously this only postpones the issue.
> With Cassandra I believe there is no mechanism that would automatically
> cause old data to be deleted/overwritten but I could be wrong.
> 
> In any case I am actually working on a project for work to address some
> issues with whisper like this and extend its functionality. Essentially,
> instead of having one wsp file for each metric that covers a relative time
> range (ie. the past 90 days) there will be many files for each metric, each
> of which covers an absolute time range. Also the storage format itself will
> be simplified. This design change has many implications, most of which are
> very good. First off it means a big reduction in disk space usage (upwards
> of 30%). Second it means that you will be able to browse/query metrics such
> that only metrics that have data in a given time span will be returned,
> effectively hiding old inactive metrics from your view unless you are
> explicitly looking for old data. A great example of why this is useful is
> that we have several old servers that have been decommissioned so we don't
> want their metrics cluttering up our hierarchy but we also want to retain
> their old data for analysis/trending.
> 
> The third major implication is that no data will be dropped automatically by
> whisper anymore, instead expiration will be managed by a separate cleanup
> process with configurable logic. Finally, my personal favorite, database
> files will become portable. This means you'll be able to move around
> database files freely, even to other graphite servers. Of course you can
> already have whisper files distributed across multiple servers but currently
> each file corresponds to all of the datapoints for a single metric. With
> this new design each file will correspond to some contiguous block of
> datapoints for a metric, not all of the datapoints. So a single metric's
> datapoints could be split across multiple servers (think: recent data on the
> live systems, older data on separate archiving systems, all transparent to
> the user).
> 
> At this point we have a design drawn up but are still in the process of
> testing a few of the concepts to make sure there are no performance
> regressions. The project timeline allots about 6 weeks before we expect this
> in production. If all goes well I think this will be a great improvement for
> Graphite. Once development starts in earnest I'll send the branch info to
> the mailing list in case anyone is adventurous and wants to mess with it.

Hi Chris,

The new and improved whisper sounds a lot like the backend I developed for my
SNMP polling system.  I've been using it for 2+ years to collect quite a bit
of data.  I wonder if there is any synergy to be found between the two
formats.

You can see it at:

http://code.google.com/p/tsdb/

TSDB defines a simple binary format that stores metrics using the timestamp as
an index for the data point.  Each variable is broken up into chunks of a
given size (most of mine are sized to hold one day worth of data).  Looking up
data by timestamp is _very_ fast since it's determine which chunk to look in,
eg 20100331 (today) and then compute the offset into the file based on the
unix timestamp.  The main problem is doing that many seeks at write time.  I
solved this by putting the chunks that are begin written on a fast disk (SSD,
but RAM disk would work too) and the migrating them to slower disk once the
writing is done.  TSDB implements a simple union filesystem to allow this
multiple levels of storage.

TSDB also has code to store aggregated versions of the data so that you can
use say 30 second data for the raw input and then build say 15 minute averages
or whatever you want.  The aggregation code works but probably needs some
attention to both simplify it and make it more general.

Sadly, there isn't much in the way of docs other than the docstrings in the
Python.  But the docstrings are pretty good I think.

Incidentally, TSDB was written to be sort of RRD like without the RR (round
robin) part.  I've also tried to make it much less magical than RRD. 

Also, the SNMP polling system that I developed TSDB for, ESxSNMP, is also on
Google code:

http://code.google.com/p/esxsnmp/

Let me know if you have any questions,

Jon



References