← Back to team overview

graphite-dev team mailing list archive

Re: Cassandra backend

 

Hi Elliot, currently whisper behaves like RRD does in that you must specify
the size/retention of the file upon creation. This means it drops old data
the same way RRD does. One thing you can do is use the whisper-resize script
to extend your whisper files but obviously this only postpones the issue.
With Cassandra I believe there is no mechanism that would automatically
cause old data to be deleted/overwritten but I could be wrong.

In any case I am actually working on a project for work to address some
issues with whisper like this and extend its functionality. Essentially,
instead of having one wsp file for each metric that covers a relative time
range (ie. the past 90 days) there will be many files for each metric, each
of which covers an absolute time range. Also the storage format itself will
be simplified. This design change has many implications, most of which are
very good. First off it means a big reduction in disk space usage (upwards
of 30%). Second it means that you will be able to browse/query metrics such
that only metrics that have data in a given time span will be returned,
effectively hiding old inactive metrics from your view unless you are
explicitly looking for old data. A great example of why this is useful is
that we have several old servers that have been decommissioned so we don't
want their metrics cluttering up our hierarchy but we also want to retain
their old data for analysis/trending.

The third major implication is that no data will be dropped automatically by
whisper anymore, instead expiration will be managed by a separate cleanup
process with configurable logic. Finally, my personal favorite, database
files will become portable. This means you'll be able to move around
database files freely, even to other graphite servers. Of course you can
already have whisper files distributed across multiple servers but currently
each file corresponds to all of the datapoints for a single metric. With
this new design each file will correspond to some contiguous block of
datapoints for a metric, not all of the datapoints. So a single metric's
datapoints could be split across multiple servers (think: recent data on the
live systems, older data on separate archiving systems, all transparent to
the user).

At this point we have a design drawn up but are still in the process of
testing a few of the concepts to make sure there are no performance
regressions. The project timeline allots about 6 weeks before we expect this
in production. If all goes well I think this will be a great improvement for
Graphite. Once development starts in earnest I'll send the branch info to
the mailing list in case anyone is adventurous and wants to mess with it.

As for Cassandra I still think this is worth exploring. I pointed out some
questions in my original message about this that I still think are important
to figure out. The current whisper work is needed to address some near-term
needs but long-term I think it'd be great if we could get an external
product like Cassandra to do some of the heavy lifting for us.

-Chris

On Fri, Mar 26, 2010 at 6:58 AM, Elliot Murphy <elliot@xxxxxxxxxxxxx> wrote:

> Hi Chris, Kraig,
>
> It's really cool to read about this work with Cassandra. At Canonical,
> we are looking to consolidate and replace multiple graphing systems
> that have sprung up over time. We have been running cricket for about
> the last 6 years, and would like to replace it with something more
> modern - one of the big problems we have with cricket right now is
> that it loses data once you pass the time period defined in your RRD
> database. So we've also got a system storing some graphing data in
> postgres. I've been pretty impressed with graphite and how shiny it
> is, but haven't used it in anger yet.
>
> This might be a dumb question, but would using this Cassandra backend
> with Graphite potentially address the problem of not wanting to ever
> lose old data points? Or does the whisper design already get rid of
> that limitation?
>
> On Thu, Mar 18, 2010 at 2:01 AM, Chris Davis <chrismd@xxxxxxxxx> wrote:
> > Wow, thanks for the performance numbers Kraig. Yes I am definitely
> > interested in learning more about Cassandra. I have not had a chance to
> try
> > it out yet myself but it sounds pretty cool and it is nice to see someone
> > has it running in a working system. I have also been hearing a lot of
> buzz
> > about Redis lately (http://code.google.com/p/redis/) and some users are
> > looking into using that as a carbon-cache replacement as well. Again I
> > haven't had a chance to mess around with it myself so I don't know all
> that
> > much about it. I am cc'ing the mailing list so we can open this up to a
> > broader discussion, possibly about Redis as well.
> > I am very curious about how Cassandra actually works and I am also
> curious
> > about the %iowait numbers from your testing. In particular the %iowait
> times
> > are nearly zero on the Cassandra servers. Presumably Cassandra is getting
> > everything persisted to disk with consistent performance over time? If it
> is
> > then I think it would be a good idea to test a scenario where there is
> both
> > a lot of metrics being written as well as a lot of metrics being
> > retrieved. The reason this is important is because there are two I/O
> access
> > patterns heavily used by Graphite that are in complete conflict with one
> > another.
>
>
> --
> Elliot Murphy | https://launchpad.net/~statik/<https://launchpad.net/%7Estatik/>
>

Follow ups

References