Easiest Nagios Extensions


Image by Gerd Altmann from Pixabay

There have been 2 scripts that have allowed me to extend Nagios more easily than almost any other monitoring configuration over the last 10 years. This has allowed me to create monitors within the applications that I have built and within existing applications. A few time it has helped me to solve complex monitoring systems where providers have provided ineffective documentation.

This article assumes that you have an understanding of Nagios.

Quite Simply Easy Monitoring

  • check_http_status.sh – allows me to write code within any URL that will return predefined strings  (STATE_OK,STATE_WARNING,STATE_CRITICAL) and a message that will help to determine the error.
  • check_http_content.sh – allows me to search a web page for a string. If that string does not exist then return an error.

Simple right? Does it exist? Maybe. Have I recreated something? Well maybe again. But have been using it for the last 10 years and it has stood the test of time. It is simple, easy to call, and uses existing infrastructure. Web programmers can make as many hooks as needed for the 

One time I was working for a telco, and the IDSN’s connections had a tendency to drop out at the most inconvenient times. We were always on the back foot and reactivating to the problem when our customers reported it to us. How to fix? This was old equipment and there was little documentation for the SNMP traps. BUT there was a web page that would have a red light icon when the ISDN lines would have a problem. The check_http_content.sh allowed me to search for the green icon (The monitor is listed below). Within half an hour I had solved all of our ISDN monitoring issues without having to sift through endless google searches trying to find the correct SNMP trap.

The other script that has been incredibly useful (check_http_status.sh) allows me to write hooks in all of the web apps. This means that all of the complex monitoring can be part of the web application itself (DevOps?)

Pros and Cons

The downside of this is that the monitoring server adds additional load on your web server. This can be controlled by the interval configuration in Nagios. It is a small price to pay to have such an easy to monitor in your systems. Anything can be monitored from processes, database sizes, event frequency, cash flow, service tickets. Anything that you can write a program for can not be monitored in Nagios. 

You have to consider the Security when you run write these scripts. It is not a problem for me as I was on a private network. You can control the access via whitelisting your monitoring server’s IP, or you can add some authentication to your scripts when you call curl.

If you need some assistance implementing this to your DevOps team, please contact us. 

nagiosCheckDatabase.php – Example web hook for checking that database exists. In this case an Oracle database

<?php
//nagiosCheckDatabase.php
require_once ( dirname ( __FILE__ ) . '/config.php' );

$dbName = Request::get ( 'HOST' );

$tab = new DBTable ( $dbName, 'SELECT SYSDATE FROM DUAL', null, DB::FETCH_NUM );

if ( ! $tab->ok() ) {
    echo "Unable to query Database STATE_CRITICAL";
}
else {
    echo "SYSDATE=" . $tab->getValue() . " - STATE_OK";
}

myservers.cfg – Example Service Configuration for Nagios for check_http_status and check_http_content

define service {
  use                   generic-service
  host_name             sydney-mpcsyd
  service_description   Job Results
  check_command         check_http_status!http://192.168.3.200:8080/LiveStats/nagiosCheckJobResults.php?HOST=mpcsyd
  normal_check_interval 60
  retry_check_interval  15
  max_check_attempts    3
}
define service{
  use                   generic-service
  host_name             sydney-rev-au-pocmp3
  service_description   ISDN OCMP3
  check_command         check_http_content!http://192.168.3.130:4242/this.BMPFFaultMgr?GetMapAction=HTML&LEVEL=TOP_LEVEL&TYPE=1&NAME=Root&DATE=0&LEV_NUM=0&LEV_NAME0=N0&LEV_NAME1=N1&LEV_NAME2=N2&LEV_NAME3=N3&LEV_TYPE0=T0&LEV_TYPE1=T1&LEV_TYPE2=T2&LEV_TYPE3=T3!greenISDNIcon.gif
}

commands.cfg – This is the Nagios configuration that connects the services to the scripts

define command {
  command_name check_http_status
  command_line /etc/nagios/scripts/check_http_status.sh '$ARG1$'
}
define command {
  command_name check_http_content
  command_line /etc/nagios/scripts/check_http_content.sh '$ARG1$' '$ARG2$'
}

check_http_status.sh

#! /bin/bash

STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
STATE_DEPENDENT=4

if test -x /usr/bin/printf; then
	ECHO=/usr/bin/printf
else
	ECHO=echo
fi

URL=$1

RESP=`curl -s --connect-timeout 300 --retry 3 --silent -f $URL`
RES=$?

if [ "$RES" != "0" ]
then
    echo "Unable to connect to $URL ($RES)"
    exit $STATE_WARNING
else
    echo "$URL: $RESP"
    if echo $RESP | grep -q STATE_OK
    then
        exit $STATE_OK
    elif echo $RESP | grep -q STATE_WARNING
    then
        exit $STATE_WARNING
    elif echo $RESP | grep -q STATE_CRITICAL
    then
        exit $STATE_CRITICAL
    else
        exit $STATE_WARNING
    fi
fi

check_http_content.sh

#! /bin/bash

STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
STATE_DEPENDENT=4

if test -x /usr/bin/printf; then
	ECHO=/usr/bin/printf
else
	ECHO=echo
fi

URL=$1
PROCESS=$2

RESP=`curl --silent -f $URL`
RES=$?

if [ "$RES" != "0" ]
then
    echo "Unable to connect to $URL ($RES)"
    exit $STATE_WARNING
else
    if echo $RESP | grep -q "$PROCESS"
    then
        echo "String ($PROCESS) exists in URL: $URL"
        exit $STATE_OK
    else
        echo "Could not find: String ($PROCESS) in URL: $URL"
        exit $STATE_CRITICAL
    fi
fi