graphite-dev team mailing list archive

Thread
Date
[Question #205123]: Support RDB as storage

To: graphite-dev@xxxxxxxxxxxxxxxxxxx
From: Nikolay Baluk <question205123@xxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 06 Aug 2012 07:41:20 -0000
Reply-to: question205123@xxxxxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx
New question #205123 on Graphite:
https://answers.launchpad.net/graphite/+question/205123

Hi! I am a Google Summer of Code 2012 student and as part of my project I write a code for the ability to use this great project Graphite to display graphs basis on data from MySQL database tables. It seems to me that we have some successes and maybe this functional can be useful to someone of Graphite's users. I guess many people have tables in their databases with data of some events with time, like downloads of file, user registration, etc and it would be cool if they can graph their data just provide db connection options, specify needed table and field with time.
So this is how it works.
In render/datalib.py in fetchData function I put this code before it trying to fetch data with Whisper:

if pathExpr.startswith('rdb.'):
    try:
      results = rdb.fetch(pathExpr, timestamp(startTime),
                          timestamp(endTime))

that ultimately returns the result in exactly the same format as if it fetch data from Whisper with
dbResults = dbFile.fetch(timestamp(startTime), timestamp(endTime)).

So if 'target' URL's parameter starts with 'rdb.' next action -- is try to get data from RDB.
This simple change allows you to maintain backward compatibility with all follow things like Apply function to graph.
The most interesting, of course, is inside the new created file rdb.py. For getting data from DB we are using SqlSoup (built-in to SQLAlchemy), so you can use an any database which is supported by the SQLAlchemy ORM.
Function 'fetch' in the rdb.py returns timeInfo and valueList with 60 sec step like a 'fetch' in Whisper.

All what user should do is provide a simple config in conf/storage-schemas-rdb.conf. 
For example:
You want to see a statistic of file's downloads from some location.
Table `Downloads`:
--------------------------------------------------------------
| id | mirror_id | file_id | time                               |
--------------------------------------------------------------
| 1 | 1                |  1         | 2012-07-06 00:00:00  |
| 2 ..........                                                                   |
| 3 ..........                                                                   |
--------------------------------------------------------------
Table `Mirrors`:
-------------------
| id | location |
-------------------
| 1 | US            |
| 2 ..........         |
| 3 ..........         |
-------------------
You want to see a statistic of file with id = 1 and download's location = US. For first condition SQL query should have only `where` clause but for second it should have SQL join. You can do this!

URL should consist:
- for first condition: target=rdb.profile1.file.1
- for second: target=rdb.profile1.location.US
Final URL is http://graphite.localhost/render/?target=rdb.profile1.file.1|location.UA
This is how filters looks like.
profile1 - name of DB config section
file - name of config section for the specific config of this data-type
1 - value
| - `AND`  conditional statement
location - name of config section for the specific config of this data-type
UA - value

And finally configuration sections:
[profile1]
sqlsoup_url = mysql://test:test@localhost:3306/test
table_name = Downloads
time_column_name = time

[profile1.file]
where_column = file_id

[profile1.location]
where_table = Mirror
where_column = location

join_table = Mirror
join_external_column = id
join_internal_column = mirror_id

The SQL-code for getting this data will looks like this:
SELECT count(%s) AS point_value, max(`Downloads`.time) AS time 
FROM `Downloads` INNER JOIN `Mirror` ON `Downloads`.mirror = `Mirror`.id 
WHERE `Downloads`.time >= %s AND `Downloads`.time <= %s AND `Downloads`.file = %s AND `Mirror`.location = %s GROUP BY year(`Downloads`.time), %s * HOUR(`Downloads`.time) + MINUTE(`Downloads`.time) ORDER BY time

That's it.

So, may I contribute this code? If not, tell please what I'm doing wrong. Thank you!

-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.