← Back to team overview

aims team mailing list archive

[Bug 1827159] Re: check_all_disks includes squashfs /snap/* which are 100%

 

This bug was fixed in the package monitoring-plugins - 2.2-6ubuntu1

---------------
monitoring-plugins (2.2-6ubuntu1) focal; urgency=medium

  * d/p/exclude-tmpfs-squashfs-tracefs.patch: Ignore artificial filesystems
    that trigger false-positive DISK CRITICAL checks due to reporting as at
    100% capacity.
    (LP: #1827159)

 -- Bryce Harrington <bryce@xxxxxxxxxxxxx>  Thu, 31 Oct 2019 00:21:55
+0000

** Changed in: monitoring-plugins (Ubuntu)
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of AIMS,
which is subscribed to a duplicate bug report (1516451).
https://bugs.launchpad.net/bugs/1827159

Title:
  check_all_disks includes squashfs /snap/* which are 100%

Status in Nagios Charm:
  Fix Released
Status in monitoring-plugins package in Ubuntu:
  Fix Released

Bug description:
  [Impact]
  False positive reports are generated in monitoring tools when artificial filesystems are mounted, since they show 100% disk utilization, and thus add unnecessary (but dire sounding) "DISK CRITICAL" noise.

  [Test Case]
  $ lxc create ubuntu-daily:19.10/amd64 lp1827159
  $ lxc exec lp1827159 bash
  # apt-get -y update
  # apt-get install monitoring-plugins
  # snap install gnome-calculator
  [...]
  # /usr/lib/nagios/plugins/check_disk -w 10 -c 10
  DISK CRITICAL - free space: / 1903 MB (1% inode=78%); /dev 0 MB (100% inode=99%); /dev/full 16018 MB (100% inode=99%); /dev/null 16018 MB (100% inode=99%); /dev/random 16018 MB (100% inode=99%); /dev/tty 16018 MB (100% inode=99%); /dev/urandom 16018 MB (100% inode=99%); /dev/zero 16018 MB (100% inode=99%); /dev/fuse 16018 MB (100% inode=99%); /dev/net/tun 16018 MB (100% inode=99%); /dev/lxd 0 MB (100% inode=99%); /dev/.lxd-mounts 0 MB (100% inode=99%); /dev/shm 16041 MB (100% inode=99%); /run 3208 MB (99% inode=99%); /run/lock 5 MB (100% inode=99%); /sys/fs/cgroup 16041 MB (100% inode=99%); /snap 1903 MB (1% inode=78%); /run/snapd/ns 3208 MB (99% inode=99%);| /=111171MB;119160;119160;0;119170 /dev=0MB;-10;-10;0;0 /dev/full=0MB;16008;16008;0;16018 /dev/null=0MB;16008;16008;0;16018 /dev/random=0MB;16008;16008;0;16018 /dev/tty=0MB;16008;16008;0;16018 /dev/urandom=0MB;16008;16008;0;16018 /dev/zero=0MB;16008;16008;0;16018 /dev/fuse=0MB;16008;16008;0;16018 /dev/net/tun=0MB;16008;16008;0;16018 /dev/lxd=0MB;-10;-10;0;0 /dev/.lxd-mounts=0MB;-10;-10;0;0 /dev/shm=0MB;16031;16031;0;16041 /run=0MB;3198;3198;0;3208 /run/lock=0MB;-5;-5;0;5 /sys/fs/cgroup=0MB;16031;16031;0;16041 /snap=111171MB;119160;119160;0;119170 /run/snapd/ns=0MB;3198;3198;0;3208

  # /usr/lib/nagios/plugins/check_disk -w 10 -c 10 -e -X squashfs
  DISK CRITICAL - free space: /dev 0 MB (100% inode=99%); /dev/lxd 0 MB (100% inode=99%); /dev/.lxd-mounts 0 MB (100% inode=99%); /run/lock 5 MB (100% inode=99%);| /=111392MB;119160;119160;0;119170 /dev=0MB;-10;-10;0;0 /dev/full=0MB;16008;16008;0;16018 /dev/null=0MB;16008;16008;0;16018 /dev/random=0MB;16008;16008;0;16018 /dev/tty=0MB;16008;16008;0;16018 /dev/urandom=0MB;16008;16008;0;16018 /dev/zero=0MB;16008;16008;0;16018 /dev/fuse=0MB;16008;16008;0;16018 /dev/net/tun=0MB;16008;16008;0;16018 /dev/lxd=0MB;-10;-10;0;0 /dev/.lxd-mounts=0MB;-10;-10;0;0 /dev/shm=0MB;16031;16031;0;16041 /run=0MB;3198;3198;0;3208 /run/lock=0MB;-5;-5;0;5 /sys/fs/cgroup=0MB;16031;16031;0;16041 /snap=111392MB;119160;119160;0;119170 /run/snapd/ns=0MB;3198;3198;0;3208

  # /usr/lib/nagios/plugins/check_disk -w 10 -c 10 -e -X tmpfs
  DISK OK| /=111171MB;119160;119160;0;119170 /dev/full=0MB;16008;16008;0;16018 /dev/null=0MB;16008;16008;0;16018 /dev/random=0MB;16008;16008;0;16018 /dev/tty=0MB;16008;16008;0;16018 /dev/urandom=0MB;16008;16008;0;16018 /dev/zero=0MB;16008;16008;0;16018 /dev/fuse=0MB;16008;16008;0;16018 /dev/net/tun=0MB;16008;16008;0;16018 /snap=111171MB;119160;119160;0;119170

  [Regression Potential]
  As this alters the logic of how out-of-space checks are handled, relevant issues to keep an eye out for would relate to filesystem checks reporting improperly.  These tools underlay a few different front-ends, so regression bugs may get filed in a few different places, however they will tend to display error messages involving check_disk, nagios, and either tmpfs or tracefs.

  Note that there are likely other synthetic filesystems beyond tmpfs
  and tracefs (e.g. udev, usbfs, devtmpfs, fuse.*, ...) which might also
  cause similar false positives; these should be handled as separate
  bugs, although they can likely be fixed the same way.

  [Fix]
  monitoring-plugins is modified to exclude the unwanted filesystems by default, in check_disk.c (see patch).

  [Discussion]
  There have been several bug reports filed about false positives with different synthetic file systems (see Dupes), including tracefs, squashfs, and tmpfs.  The commonly discussed workaround is to exclude these when running the tools (e.g. using the '-X <fs>' parameter for check_all_disks).  Since wrappers are typically used for running the underlying tools, it is possible to add a string of -X... parameters.

  However, a cleaner solution is possible.  monitoring-plugins'
  check_disk.c maintains an internal exclusion list, fs_exclude_list,
  which already excludes iso9660, and can be modified to add other
  filesystems to exclude by default.

  In other words, check_disk.c is modified thusly:

    np_add_name(&fs_exclude_list, "iso9660");
    np_add_name(&fs_exclude_list, "squashfs");
    np_add_name(&fs_exclude_list, "tmpfs");
    np_add_name(&fs_exclude_list, "tracefs");

  This code is added prior to the command line parsing logic, and as
  such simply sets default behavior.  It does not preclude further
  adding or removing filesystems via the -X and -N parameters.  Indeed,
  if someone were to desire checking tmpfs, they are able to manually
  add it, via "-N tmpfs".


  [Original Report]
  When using nagios to monitor the Nagios host itself, if the host is not a container, the template for checking the disk space on the Nagios host does not exclude any snap filesystems.  This means we get a Critical report if any snap is installed.

  This can be changed by adding to the check_all_disks command a '-X
  squashfs', but that command is defined in the nagios plugins package.

  (Or, perhaps '-X tmpfs'? -- bryce)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nagios-charm/+bug/1827159/+subscriptions