graphite-dev team mailing list archive

Thread
Date

Re: [Question #223956]: Graphite-Web Refactoring Help Request

To: graphite-dev@xxxxxxxxxxxxxxxxxxx
From: Dieter P <question223956@xxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 17 Mar 2013 15:16:07 -0000
Reply-to: question223956@xxxxxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

Question #223956 on Graphite changed:
https://answers.launchpad.net/graphite/+question/223956

Dieter P posted a new comment:
re: "Nodes/Leaves should ID a timeseries with a generic ID field and provide a display_name field for GUI use"
1) In graphite only leaves in the tree point to timeseries. (i.e. if you have a metric name "foo.bar.baz" then "foo.bar" is nothing). I think this is sensible. do you want to change this? (if so what would "foo.bar" be?)
2) what are reasons why the display name would be different from the metric name? (assuming the metric name would still be all nodes in the tree "as.we.are.used.to" and which would translate into the TSUID) i don't see a need for this, especially in graphite where you interact so closely with the metric names (when building graphs and dashboards) that hiding their names seems to be disadvantageous (probably read the tagging section below first)

=== tagging ===
I was actually going to start a separate discussion, but since you bring it up here...
First, you should know about:
* https://github.com/Dieterbe/graph-explorer/tree/master/structured_metrics : a library that converts the graphite metric list into a tag space of metrics
* https://github.com/Dieterbe/graph-explorer: a graphite dashboard that takes this tag space and provides a query language so you can filter metrics and group them into graphs by tag(s)
I will refer to these as 'G-E'

to take the last example from http://www.euphoriaaudio.com/opentsdb/http-api-meta.html
that metric can be written as:
{
"name": "tsd.http.latency_50pct",
"display_name": "HTTP Latency 50pct",
"tags": {'host': 'hobbes-64bit', 'type': 'all'}
}
1) as you can see, I brought down the markup for tags substantially. I find the syntax demonstrated in your opentsdb RFC quite overengineered, which is also evident because so many fields are just empty.
2) one thing that I learned with G-E is that the more information you can capture in tags, the better (because it's structured data, clearly defined, so more usable. "name" has no clear meaning for metrics and so can only be used for text filtering ). Luckily, there's no need for a "name" attribute, if you add additional tags such as protocol=http, what=seconds, type=latency_50pct. this gives more power for filtering, aggregating and grouping metrics when composing graphs. As a rule of thumb, I would say never have a 'name' attribute, always aim to structure data in more specific tags.
(the canonical opentsdb example metric "mysql.bytes_sent schema=foo host=db2" becomes "service=mysql what=bytes type=sent schema=foo foo=db2")

Furthermore:

1) I would argue that the display of metric names should not be configured at the metric level as in your examples, but can easily be generated.
* in a composer interface you can just list all the tags in a predefined order: In G-E it's just "%what %type %target_type <other tags alphabetically sorted> %server %plugin")
* this becomes more apparent when viewing a graph: say you are plotting these two metrics on one graph:
{'what': 'bytes', 'service': 'mysql', 'server':'host1', 'type': 'sent'}
{'what': 'bytes', 'service': 'mysql', 'server':'host1', 'type': 'received'}

for this graph, the tags 'what', 'service' and 'server' are constant,
and the 'type' tag is variable. So the graph title can be computed to
be "host1 mysql bytes" and the entries on the legend would be 'sent' and
'received'. This is what G-E does, it's trivial to implement and
creates a non-reduntant display of information independent of how you
group metrics into graphs.

2) nowadays, many metrics in graphite have names that are just to
unclear. often they don't specify the unit of measurement (seconds? ms?,
bits? bytes? elements in a queue? an amount of errors?), prefixes used
(M,G, etc) and how it should be interpreted (is this a number per
second? per flushinterval (like statsd counts), etc). G-E solves this
by making the 'what' and 'target_type' tags mandatory and clearly
defined.

3) the current tree based organisational paradigm is a bit too simple. There's basically no way to organise your tree to support all ways of later querying it (so that you can later do "give me all metrics related to service mysql, or all metrics that are an amount of errors", which causes people spending too much time trying to.
(see also the amounts of statsd issues/PR's related to suffixes, prefixes and namespacing). a tag based system makes this moot.

This is why I'm in favor of deprecating the tree based method entirely and moving towards a completely tag-based database and query method.
This can actually be implemented more easily than one would think: actual metrics would still be stored based on a key/filename/id; this would either be a hash of all tag key/value pairs, sorted, or whatever the 'name' tag says (and if you specify 'foo.bar.baz' that's the name tag. this gives instant backwards compatibility). for all incoming metrics, just store all tags in a database along with the id of the metric, so it's easy to query for metrics, but because of the hashing, no lookups are needed when storing time series data. this also has the benefit of being compatible with different (existing or not) backends such as ceres, whisper, etc; they don't have to implement the tagging.

=== events ===
quote: "B) We’re also adding annotation support to track/mark events. Same thing, Graphite could store notes in the DB or get the info from OpenTSDB".
I think there's no benefit of deep integration between an event/change management system with a metrics database/management system, because events/changes are inherently very different things than metrics. They have different requirements wrt ingestion, storage, management, GUI's, etc. (I believe deep integration leads to feature creep, scope dilution, and harder integration with other software i.e. monolithic software)
They do go alongside on graphs, which is AFAICT the only place where metrics and events meet. That's why I think it's sensible to have a separate change/event management system, and have a timeseries graphing widget where they can be rendered together (as directed by dashboard software)
>From this conviction, I've written:
* https://github.com/Dieterbe/anthracite change/event management system (inspired by graphite's philosophy)
* https://github.com/Dieterbe/timeserieswidget to render graphs and events/changes in a "rich" way (with annotation text etc), as you would expect it supports and targets graphite and anthracite.

Btw, are you going to monitorama? I will, as will a bunch of other
graphite devs.

--
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.