graphite-dev team mailing list archive
-
graphite-dev team
-
Mailing list archive
-
Message #05907
[Question #276589]: Missing metrics in periodically in graphite
New question #276589 on Graphite:
https://answers.launchpad.net/graphite/+question/276589
I am running a config of 3 servers behind a single load balancer. Server A, B, C all run a carbon-relay with 2 carbon-caches (1 for each cpu as I have read in other documentation). I am seeing an issue where a consistent metric is missing periodically and then will be written later.
example:
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 guest.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:31 idle.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 iowait.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 irq.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:31 nice.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 softirq.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 steal.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:31 system.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 user.wsp
You can see that idle, nice, and system cpu metrics are all behind by 3 minutes. these metrics are delivered every 60 seconds and my storage-schema matches that.
This is only on server A. Server B and C both have the metrics. I am running the same configs on all 3 boxes. One really interesting thing I have seen is the cache-b logs have a lot of queries, and cache-a logs have none. Also, cache-a never showed a queue increase where cache-b shows a queue increase to 800. I have been see fullqueuedrops but don't understand why.
On the disk side I am running SSD and seeing the following from iostat. I can provide more info if needed.
-sh-4.2$ iostat -d 1
Linux 3.10.0-229.14.1.el7.x86_64 (ip-10-110-1-18) 12/03/2015 _x86_64_ (2 CPU)
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 486.49 5.27 2171.49 4232842 1744289310
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 1.00 8.00 0.00 8 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 2150.00 0.00 8600.00 0 8600
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 3051.00 0.00 12232.00 0 12232
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 2934.00 0.00 12984.00 0 12984
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 1056.00 0.00 4228.00 0 4228
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 0.00 0.00 0.00
I am currently doing about 40k metrics / 60 seconds. I'm really confused why I'm seeing a consistency in the missing metrics. I thought if this was a queue or caching issue it could be random metrics. Any help and direction would really be appreciated.
Thanks.
--
You received this question notification because your team graphite-dev
is an answer contact for Graphite.