← Back to team overview

graphite-dev team mailing list archive

Re: [Question #193502]: Importing historical summed data gives weird aggregated result

 

Question #193502 on Graphite changed:
https://answers.launchpad.net/graphite/+question/193502

    Status: Open => Answered

Michael Leinartas proposed the following answer:
So you're finding out that the particulars of how aggregation works in
the whisper database are a bit wonky..

I'm looking at the first example primarily right now. To start, you are
going about inspecting the retentions in the correct way, but are a
little bit off in the time. Whisper will return data from the highest
precision archive (retention definition) that will satisfy the entire
period specified. Requesting 3 seconds of data to verify the 1s:3s
archive is correct, however Whisper does everything relative to the
current time so you instead want to use $(($(date +%s) - 3)) to get the
first archive - you should get 3 points in that case returned, all with
a value of 1. You can also update several points at once with whisper-
update.py so that it happens quicker (before the current second rolls
over). Finally, I've noticed that aggregation behaves unexpectedly when
there aren't enough points in the first archive to satisfy the 2nd
archive (you found a weird edge case). The minimum retention you should
use in this case is 1s:5s. Here's a slightly modified script:

Script 1 modified
=======================================
#!/bin/bash

rm -f test.wsp
whisper-create.py --xFilesFactor=0 --aggregationMethod=sum test.wsp 1s:5s 5s:20s
CREATED=$(date +%s)
echo "Created: $CREATED"
whisper-update.py test.wsp $(($(date +%s))):1 $(($(date +%s)-1)):1 $(($(date +%s)-2)):1 $(($(date +%s)-3)):1 $(($(date +%s)-4)):1
echo
echo Using 1s resolution:
whisper-fetch.py --from=$(($(date +%s)-5)) test.wsp
echo

echo Using 5s resolution:
whisper-fetch.py --from=$(($(date +%s)-30)) test.wsp

Output from modified script 1:
=======================================
Created: test.wsp (148 bytes)
Created: 1334621059
[('1334621059', '1'), ('1334621058', '1'), ('1334621057', '1'), ('1334621056', '1'), ('1334621055', '1')]

Using 1s resolution:
1334621055	1.000000
1334621056	1.000000
1334621057	1.000000
1334621058	1.000000
1334621059	1.000000

Using 5s resolution:
1334621040	None
1334621045	None
1334621050	None
1334621055	5.000000

This should look like you expect. The 5 points in the first archive are
aggregated into the 1334621055 bucket as a sum. Running it multiple
times will show that sometimes those 5 points will end up in a single
bucket and sometimes they'll be split between two (depending on what
second it's run on).


The 2nd script isn't doing what you expect because whisper-resize.py is 'dumb.' It iterates through the archives in reverse order (lowest resolution and longest retention to highest resolution and shortest retention), pulls the data out of each, and writes it to a new archive. It's best suited for simple resizes - extending a whisper file to cover a longer period at the lowest resolution for example.

Aggregation happens at storage time. Once a point is stored in an
archive (starting with the highest resolution archive), each lower
archive will read all of the points from the higher archive, aggregate
them, and store them. When you store points beyond the first archive in
age (through a resize or explicit storage) this propagation doesn't
happen. Instead, it's writing into the same bucket several times and
overwriting the last one each time.


What you'll need to do is to pre-aggregate your historical data for back-loading. Generally you'll work on getting the data sent to carbon and worry about back-loading later. That way you can also only worry about aggregating for your lowest precision archive (the 1d:730) if you wait a week for live data to load up.

Hope this helps

-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.