← Back to team overview

launchpad-dev team mailing list archive

Questions about Tuolumne

 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've been poking around Tuolumne to figure out why it is a bit slow to
render graphs. I have some ideas, but I wanted to check some assumptions.

1) Why does Datum have both keys "metric_id" and "datafile_id". From
what I can tell, Metric already has "datafile_id", which makes the Datum
entry redundant.

2) Looking at the data, on the 'process_data' side, it queries for an
exact match on (metric_id, datafile_id, timestamp). Looking at the
Views, it queries for all (metric_id, timestamp) that fall in a given range.

3) ATM we have an index on "timestamp", "metric_id" and (datafile_id,
metric_id, timestamp).

The last index ensures that inserts are fast.

But lookups have to do a Bitmap_Index_Scan combining the timestamp index
with the metric_id index. And 'timestamp' isn't very selective. If you
filter over 1 month time, you get about 22M entries. (timestamp,
metric_id) filters down to under 6000 entries (depending on the logging
rate, but logging every minute is only 43k entries).

4) So I was thinking to just add a new index (metric_id, timestamp).
However, if datafile_id, is redundant, then we can drop the column, and
get rid of the index. I haven't looked everywhere, but I also get the
feeling we can drop the individual indexes on (metric_id) and (timestamp).

Which should decrease a good portion of the 40GB the table takes up
today, as well as create lookups that only read the 6k rows they need,
rather than filtering through 22M rows.


Thoughts?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk13J5kACgkQJdeBCYSNAAPkrACeN0UOitjBcipJVRGPY5UrHJMu
ObkAnikQQlScAy22zTgo1J6lKqWT3t5E
=yzVy
-----END PGP SIGNATURE-----