nagios-charmers team mailing list archive
-
nagios-charmers team
-
Mailing list archive
-
Message #00595
[Bug 1842039] Re: nrpe checks are partially ignored by nagios
In comment #1 the nagios context was not set at the same value
everywhere. After fixing the nagios context:
$ juju run --unit nrpe-kubernetes-worker-gpu/2 -- ls -l /var/lib/nagios/export
total 20
-rw-r--r-- 1 root root 275 Aug 30 11:05 host__juju-k8s-kubernetes-worker-gpu-6.cfg
-rw-r--r-- 1 root root 496 Aug 30 08:40 service__juju-k8s-kubernetes-worker-gpu-6_check_docker.cfg
-rw-r--r-- 1 root root 497 Aug 30 11:05 service__juju-k8s-kubernetes-worker-gpu-6_check_flannel.cfg
-rw-r--r-- 1 root root 528 Aug 30 08:40 service__juju-k8s-kubernetes-worker-gpu-6_check_snap.kube-proxy.daemon.cfg
-rw-r--r-- 1 root root 522 Aug 30 08:40 service__juju-k8s-kubernetes-worker-gpu-6_check_snap.kubelet.daemon.cfg
$ juju run --unit nrpe-kubernetes-worker-gpu/2 -- cat /var/lib/nagios/export/*
#---------------------------------------------------
# This file is Juju managed
#--------------------------------------------------
define host {
address 10.2.4.131
host_name juju-k8s-kubernetes-worker-gpu-6
use server
hostgroups machines,
}
#---------------------------------------------------
# This file is Juju managed
#---------------------------------------------------
define service {
use active-service
host_name juju-k8s-kubernetes-worker-gpu-6
service_description juju-k8s-kubernetes-worker-gpu-6[docker] process check {kubernetes-worker-gpu/6}
check_command check_nrpe!check_docker
servicegroups juju-k8s
}
#---------------------------------------------------
# This file is Juju managed
#---------------------------------------------------
define service {
use active-service
host_name juju-k8s-kubernetes-worker-gpu-6
service_description juju-k8s-kubernetes-worker-gpu-6[flannel] process check {juju-k8s:flannel-gpu/3}
check_command check_nrpe!check_flannel
servicegroups juju-k8s
}
#---------------------------------------------------
# This file is Juju managed
#---------------------------------------------------
define service {
use active-service
host_name juju-k8s-kubernetes-worker-gpu-6
service_description juju-k8s-kubernetes-worker-gpu-6[snap.kube-proxy.daemon] process check {kubernetes-worker-gpu/6}
check_command check_nrpe!check_snap.kube-proxy.daemon
servicegroups juju-k8s
}
#---------------------------------------------------
# This file is Juju managed
#---------------------------------------------------
define service {
use active-service
host_name juju-k8s-kubernetes-worker-gpu-6
service_description juju-k8s-kubernetes-worker-gpu-6[snap.kubelet.daemon] process check {kubernetes-worker-gpu/6}
check_command check_nrpe!check_snap.kubelet.daemon
servicegroups juju-k8s
}
--
You received this bug notification because you are a member of Nagios
Charm developers, which is subscribed to Nagios Charm.
https://bugs.launchpad.net/bugs/1842039
Title:
nrpe checks are partially ignored by nagios
Status in Nagios Charm:
New
Status in NRPE Charm:
New
Bug description:
In a kubernetes model, we have some kubernetes-worker charms
(kubernetes-worker-gpu) with a relation to an nrpe charm (nrpe-
kubernetes-worker-gpu). The nrpe charm has a relation to a nagios
charm (nagios-server-k8s).
If we list the nrpe-checks it looks like we have a few:
$ $ juju run-action --wait nrpe-kubernetes-worker/8 list-nrpe-checks
unit-nrpe-kubernetes-worker-8:
id: 37d06ff4-6861-496f-8a2e-cd3464991782
results:
checks:
check-arp-cache: /usr/local/lib/nagios/plugins/check_arp_cache.py -w 60 -c 80
check-conntrack: /usr/local/lib/nagios/plugins/check_conntrack.sh -w 80 -c 90
check-disk-root: '/usr/lib/nagios/plugins/check_disk -u GB -w 25% -c 20% -K
5% -p / '
check-docker: /usr/local/lib/nagios/plugins/check_systemd.py docker
check-flannel: /usr/local/lib/nagios/plugins/check_systemd.py flannel
check-load: /usr/lib/nagios/plugins/check_load -w 192,96,48 -c 384,192,96
check-mem: /usr/local/lib/nagios/plugins/check_mem.pl -C -h -u -w 85 -c 90
check-snap:
kube-proxy:
daemon: /usr/local/lib/nagios/plugins/check_systemd.py snap.kube-proxy.daemon
kubelet:
daemon: /usr/local/lib/nagios/plugins/check_systemd.py snap.kubelet.daemon
timestamp: Fri Aug 30 11:09:43 CEST 2019
status: completed
timing:
completed: 2019-08-30 11:09:43 +0200 CEST
enqueued: 2019-08-30 11:09:38 +0200 CEST
started: 2019-08-30 11:09:42 +0200 CEST
unit: nrpe-kubernetes-worker/8
But we don't find all of them in nagios (check-snap checks are
missing):
$ juju run --unit nagios-server-k8s/6 -- grep -r "service.*kubernetes-worker-gpu" /etc/nagios3/conf.d/
/etc/nagios3/conf.d/charm.cfg: service_description juju-k8s-kubernetes-worker-gpu-6-check_arp_cache
/etc/nagios3/conf.d/charm.cfg: service_description juju-k8s-kubernetes-worker-gpu-6-check_mem
/etc/nagios3/conf.d/charm.cfg: service_description juju-k8s-kubernetes-worker-gpu-6-check_conntrack
/etc/nagios3/conf.d/charm.cfg: service_description juju-k8s-kubernetes-worker-gpu-6-check_load
/etc/nagios3/conf.d/charm.cfg: service_description juju-k8s-kubernetes-worker-gpu-6-docker
/etc/nagios3/conf.d/charm.cfg: service_description juju-k8s-kubernetes-worker-gpu-6-check_disk_root
/etc/nagios3/conf.d/charm.cfg: service_description juju-k8s-kubernetes-worker-gpu-6-flannel
To manage notifications about this bug go to:
https://bugs.launchpad.net/nagios-charm/+bug/1842039/+subscriptions
References