TWiki> CF Web>CscfSpecific>NagiosMonitoring (revision 24)EditAttach

Nagios Monitoring Tool

Version 1.8 has the notes from setting things up "the old way", ie with 1.x running on oates. Herewith follows The New Way, nagios 3 on nagios.cscf.

CSCF Research group test installation

Currently the machine watcher202.cscf.uwaterloo.ca is running a nagios monitoring system. But we have a CNAME nagios.cscf.uwaterloo.ca too, which points at it as well now too.

You can look at it here - it'll ask for UWDir authentication.

Installation notes

watcher202 runs FreeBSD, currently 8.1. nagios was installed from ports, something like the procedure here. Note that FreeBSD is not currently xhiered, but it does obey some xhier rules regarding who can log in from where.

Config files

Config files are stored in /usr/local/etc/nagios. One should, of course, RTFM before trying to configure much.

Hosts and services

All hosts and services are defined under the hosts directory. This directory contains separate sub-directories for the different CSCF groups (research, infrastructure, etc.) and these, in turn, include separate sub-directories for the different machine groups. A group directory contains the group configuration file, and a separate configuration file for each of the machines in the group. Each of the latter contains a host object and, optionally, several service objects.

Example: The database research group directory structure:

/usr/local/etc/nagios
  |- hosts
    |- research
      |- DB
        |- db.cfg           (group configuration file)
        |- nimbus.cs.cfg    (host+services configuration file)
        |- softbase.cs.cfg  (host+services configuration file)

Note: The previous, simple structure with a single hosts.cfg file is now deprecated.

Every host object inherits from the host template defined in misc.cfg. Similarly, every service object inherits from the service template in the same file.

Other major files for CSCF editing purposes are:

  1. contacts.cfg
    • list of people who can be contacted plus how and when to contact them, as well as groups
  2. misc.cfg
    • command and template definitions

Accessing nagios

  1. become root on cscf.cs
  2. ssh nagios.cscf
  3. set LOGNAME=youruserid for RCS purposes
  4. set USER=youruserid for RCS purposes

Adding a new service on a previously defined host

Note: The previous, simple structure with a single services.cfg file is now deprecated.

See: NagiosMonitoring#Config_files

Here's an example of adding a new service to monitor, assuming that nagios already does something with the machine. (We'll talk about adding a new machine later.)

  1. ssh to nagios.cscf
  2. sudo to root. If you don't have sudo access, talk to DawnKeenan or DaveGawley.
  3. Set your LOGNAME and USER variables for RCS purposes, then change to the directory where config files are stored (/usr/local/etc/nagios)
  4. Check out and edit the file services.cfg.
  5. As a minimum, for most Unix hosts you will want to do an ssh test:
define service{           
 use generic-service     
 host_name softbase.math     
 service_description SSH
 is_volatile 0
 contact_groups cscf-rg
 check_command check_ssh
}
  1. Note that you must use the host_name, not the alias
  2. The scripts for the services in check_command can be found in: /usr/local/libexec/nagios
  3. Test the config file like this: /usr/local/bin/nagios -v /usr/local/etc/nagios/nagios.cfg - it should report zero errors. If it reports any, fix them all.
  4. Once that's satisfied, check your changes in and restart nagios: /usr/local/etc/rc.d/nagios restart.

You'll want to stick around long enough to make sure your changes don't cause problems.

Adding a new host

Note: The previous, simple structure with a single hosts.cfg file is now deprecated.

See: NagiosMonitoring#Config_files

  1. ssh to nagios.cscf
  2. sudo to root.
  3. Make sure you can ping the host you wish to monitor from this machine. If you can't, the service checks will automatically fail because the first check that Nagios does is a check-host-alive. There may be a way around this, but we haven't worked it out yet.
  4. Set your LOGNAME and USER variables for RCS purposes, then change to the directory where config files are stored (/usr/local/etc/nagios).
  5. Check out and edit the file hosts.cfg.
    • add an entry similar to the following:
define host{
 use generic-host
 host_name zonker
 alias zonker
 address 129.97.74.66
 contact_groups cscf-rsg
}
    • "use generic-host" tells it to use the generic host definition at the top of the file for default configuration options
  1. Test the config file like this: /usr/local/bin/nagios -v /usr/local/etc/nagios/nagios.cfg - it should report zero errors. If it reports any, fix them all.
  2. Once that's satisfied, check your changes in and restart nagios: /usr/local/etc/rc.d/nagios restart.

Adding a new host group

Note: The previous, simple structure with a single hosts.cfg file is now deprecated.

See: NagiosMonitoring#Config_files

  1. ssh to nagios.cscf
  2. sudo to root.
  3. Set your LOGNAME and USER variables for RCS purposes, then change to the directory where config files are stored (/usr/local/etc/nagios).
  4. Check out and edit the file hosts.cfg.
    • add an entry similar to the following:
# CS core servers
define hostgroup{
 hostgroup_name cscore
 alias CS core servers
 contact_groups cscf-csi
 members fe02.math,hopper.math,barbarus.cs,cpu102.cs,cpu104.cs,cpu106.cs,cpu108.cs,cpu110.cs,cpu112.cs,cpu114.cs
}
  1. Note that you have to have previously defined the host earlier in hosts.cfg and you must use host_name, not the alias
  2. Test the config file like this: /usr/local/bin/nagios -v /usr/local/etc/nagios/nagios.cfg - it should report zero errors. If it reports any, fix them all.
  3. Once that's satisfied, check your changes in and restart nagios: /usr/local/etc/rc.d/nagios restart.

Adding a contact

  1. ssh to nagios.cscf
  2. sudo to root.
  3. Set your LOGNAME and USER variables for RCS purposes, then change to the directory where config files are stored (/usr/local/etc/nagios).
  4. Check out and edit the file contacts.cfg.
    • add an entry similar to the following:
define contact{
 contact_name lfolland
 alias Lawrence Folland
 service_notification_period 24x7
 host_notification_period 24x7
 service_notification_options w,u,c,r
 host_notification_options d,u,r
 service_notification_commands notify-by-email
 host_notification_commands host-notify-by-email
 email lfolland@cs.uwaterloo.ca
}
  1. Test the config file like this: /usr/local/bin/nagios -v /usr/local/etc/nagios/nagios.cfg - it should report zero errors. If it reports any, fix them all.
  2. Once that's satisfied, check your changes in and restart nagios: /usr/local/etc/rc.d/nagios restart.

Adding a contact group

  1. First make sure that all of the contacts that you will list have been added individually (see above)
  2. ssh to nagios.cscf
  3. sudo to root.
  4. Set your LOGNAME and USER variables for RCS purposes, then change to the directory where config files are stored (/usr/local/etc/nagios).
  5. Check out and edit the file contacts.cfg.
    • add an entry similar to the following:
define contactgroup{
 contactgroup_name cscf-rsg
 alias CSCF Research Group
 members mpatters,lfolland,magore,trg
}
  1. Test the config file like this: /usr/local/bin/nagios -v /usr/local/etc/nagios/nagios.cfg - it should report zero errors. If it reports any, fix them all. Especially make sure that all individual contacts added have their own entry.
  2. Once that's satisfied, check your changes in and restart nagios: /usr/local/etc/rc.d/nagios restart.

Disabling a host from monitoring

This is useful if you will be shutting down a machine and don't want Nagios sending out lots of email about it being down!

Removing a host from monitoring

  1. in hosts.cfg
    • remove the entry for the machine name
    • remove any references to it in any groups
  2. in services.cfg
    • remove all services being monitored for that machine
  3. Check your Nagios config
  4. Restart Nagios

Checking Apache virtual hosts

The easiest way to do this is to define the virtual host to be the same as the "real" host, but with a different name. For instance:

define host{
 use generic-host
 host_name softbase.math
 alias softbase
 address softbase.math.uwaterloo.ca
 contact_groups cscf-rsg
}

define host{
 use generic-host
 host_name db
 alias db
 address db.uwaterloo.ca
 contact_groups cscf-rsg
}

Then you can monitor the Apache virtual host db.uwaterloo.ca like this:

define service{
 use generic-service
 host_name db
 service_description DBWEB
 is_volatile 0
 contact_groups cscf-rsg
 check_command check_http
}

This likely results in double-pinging hosts though, there may be a better way to do it.

Checking services for hosts you can't ping

Some people can't or won't pass ICMP echoes through their firewalls. One such example is zonker.

define host{
 use generic-host
 host_name zonker
 alias zonker
 address zonker.cs.uwaterloo.ca
 check_command check_none
 contact_groups cscf-rsg
}

Here, the key is check_command. Now, looking in checkcommands.cfg :

define command{
 command_name check_none
 command_line $USER1$/check_dummy 0
}

This will always return "OK", so nagios thinks the machine is always up.

Check your config

  1. Test the config file like this: /usr/local/bin/nagios -v /usr/local/etc/nagios/nagios.cfg
    • it should report zero errors. If it reports any, fix them all.

Restart Nagios

  1. First check your config (see above)
  2. Check your changes in:
    • ci -u hosts.cfg
    • ci -u services.cfg
    • ci -u contacts.cfg
  3. Restart nagios: /usr/local/etc/rc.d/nagios restart

-- MikePatterson - 21 Feb 2005, 09 May 2005 (with help from LawrenceFolland), 21 April 2006

Edit | Attach | Watch | Print version | History: r26 < r25 < r24 < r23 < r22 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r24 - 2012-04-20 - LawrenceFolland
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback