TWiki
>
CF Web
>
CSCFSoftwareNotes
>
NagiosSystemsMonitoring
(revision 62) (raw view)
Edit
Attach
---+ Nagios Systems Monitoring and Reporting CSCF uses Nagios to monitor and report on hosts. Our setup is described on this page, including extensions to Nagios to integrate with our inventory system. %TOC% ---++ CSCF Master ST items [[https://cs.uwaterloo.ca/cscf/internal/request/UpdateRequest?89921][ST#89921]] and [[https://cs.uwaterloo.ca/cscf/internal/request/UpdateRequest?95825][ST#95825]]. ---++ Project Overview The system that Nagios runs on is [[https://nagios.cscf.uwaterloo.ca][nagios.cscf.uwaterloo.ca]] running in a virtual container on =asgard.cs.uwaterloo.ca.= The system currently has [[http://www.nagiosql.org][NagiosQL]] running for configuration management. [[http://www.nagvis.org/home][NagVis]] should be added in the future. Nagios will notify the DNS Contact for inventory synchronized hosts. ---++ General Information The new Nagios monitoring system integrates with inventory to retrieve host information, support group classifications and default contact information. Also integrated with inventory is Nagios's list of services that can be monitored. These services can be added to hosts through inventory (see below) and are updated into Nagios immediately upon saving. For security and logistical reasons inventory has no write capabilities on the Nagios server; because of this nothing propagates either to or from inventory except during updates. Basic configuration for hosts can (and should) be done through the inventory system, for more complicated setups please use !NagiosQL to perform the configuration. In general, if a configuration item is not specifically synchronized from Inventory (such as the default contact, or the hostname) it should be safe to make modifications through !NagiosQL. The CSCF Help Desk is configured to receive all notifications regardless of how the machine or service was added or who else may or may not be receiving notifications. The CSCF Help Desk is required to follow up every notification and ensure that the problem is associated with an ST and that the appropriate person is assigned the ST. Please ensure that it is clear in inventory who this person should be for each machine you add. All configuration is stored in a database and is written out to Nagios' native configuration files on every update. If you modify a config file with !NagiosQL information at the top (located in /etc/nagios3/conf.d/) *your changes will be lost the next time the inventory record is saved*. *The system described herein is designed for monitoring high-availability systems and machines. Any and all problems reported by it will be followed up. Please do not monitor test and workstation machines with this system.* (Lab machines that need to be monitored should be added so that they are only reported on during the hours when the lab is open.) ---+++ Machine Notes * [[CFPrivate.Nagios-202]] * [[CFPrivate.Nagios-204]] * [[CFPrivate.InstallingNagiosWithSyncScript]] - Documentation on installing Nagios ---+++ Accessing Nagios The home page for Nagios Monitoring system can be found at https://nagios.cscf.uwaterloo.ca/ The Nagios Monitoring system provides several interfaces for interacting with it. These are as follows: ---++++ Inventory Located at: https://cs.uwaterloo.ca/cscf/internal/inventory/web/inventory/ Provides a basic interface which should be sufficient for most common applications. On any record with an IP address and host-name, they have a "Services" tab which provides the capability to add and remove predefined services on hosts. See InventoryUserDocs for more details. This should be used to configure most machines with relatively standard configurations. If you need to visit the Nagios CGI interface, there is a link within the Services tab; also you can access via the Nagios front page (see directly below). ---++++ Nagios CGI Interface Located at: https://nagios.cscf.uwaterloo.ca/nagios3/ This is the traditional Nagios interface. It should be used for viewing the status of the network, monitoring the status individual machines or services, scheduling downtime, and acknowledging issues/outages. It does not provide capability to make changes to the overall configuration. (The 'System' → 'Configuration' section is for viewing the configuration only - see below under !NagiosQL Web Administration for updating the configuration.) Following are some commonly needed Nagios tasks, and how to do them: ---+++++ Displaying all details for a given host To display all monitoring details for a given host, you can either start from Inventory in the "Services" tab and choose "Nagios Record"; or in Nagios, you can browse the list of hosts by choosing the "Hosts" item from the left-hand menu. ---+++++ Displaying all hosts with a monitored service To display all hosts with a monitored service, choose the "Host Groups" item from the left-hand menu. ---++++ !NagiosQL Web Administration Panel Located at: https://nagios.cscf.uwaterloo.ca/nagiosql32/ This provides an advanced web based configuration system that allows for extremely flexible configuration. This should be used to set up machines that require complicated or unusual monitoring requirements, to clean up the database and remove hosts that no longer exist (even if they were added through inventory), and to configure the services that are available for selection through inventory. ---++++ E-Mail Nagios will occasionally send out email notifications. These notifications will contain links to relevant sections of the Nagios CGI interface. It is not necessary to reply to these emails; response, if necessary, should be completed through the Nagios CGI interface as well as by fixing any problems outlined in the email received. ---++++ !MySQL Database and !PHPMyAdmin Located at: https://nagios.cscf.uwaterloo.ca/phpmyadmin/ This interface should not be used. It is provided for the administration of the system itself, not of the hosts being monitored. The underlying database is not particularly straightforward and is not suited for casual modification. If in doubt DO NOT TOUCH. No one will thank you when you fill their inbox up with notifications about non-existent problems, or when Nagios does not warn them of a major outage. If you really need to modify something, be very careful and MAKE A BACKUP FIRST. Also consider if there are other options. Unless you are modifying/updating/enhancing the script that integrates with inventory there should be absolutely no reason to modify the database directly. Use !NagiosQL instead. ---++++ Force Update Located at: https://nagios.cscf.uwaterloo.ca/force-update/ This barely qualifies as an interface, but it may prove quite useful when doing large amounts of administrative work or when feeling impatient. This interface can be accessed using any of the accounts for the CGI (it uses the same authentication system) and simply allows an individual to request an immediate update to Nagios (It actually runs it while the page is loading). It is also useful in the case where something isn't working and Nagios is not getting updated as expected, as this gives the output of the script. ---++++ Inventory Services Tools Located at: https://nagios.cscf.uwaterloo.ca/inv-services-tools/ This is a very minimalistic interface providing the capability to alter the display name of services that are configured through inventory. Since these services are all internally referred to by their display name it is necessary to update it in multiple places. This script allows you to accomplish that easily and without breaking things. Changes are immediate in inventory and !NagiosQL and propagate to Nagios at the next sync.This interface also displays the contents of the note field of the host group from !NagiosQL. ---++++ Nagios Restart History This provides the capability to view the restart logs, which are kept for 288 restarts (it used to be 3 days... which was 288 restarts...). <!-- This provides the capability to view all of the logs that the script has produced, with the ability to filter based on the level. Please note that if the script is changed to run with maximum logging (-v=11 --flags=0) the log file will almost certainly be too large for PHP to load, and it will fail. --> Located at: https://nagios.cscf.uwaterloo.ca/naghistory/ There are also log files stored on the server for every API call, in =/home/nagios-script/logs/= ---++ Usage Instructions ---+++ Adding a new host via Inventory Monitoring for hosts can now be set up through the use of the inventory system. This can be achieved by visiting the inventory page for the item you wish to add services to. Once there you will find a section labelled "Services" immediately under the pre-expanded "General" section. Expanding this section reveals an interface that allows for the addition and removal of services. Simply use the provided buttons and menu in order to configure the services that are necessary for that particular host. It should be noted that the service changes are saved into inventory's database immediately though they will not be propagated to Nagios until the next update. Also of note is the set of check boxes under the "Monitoring" column, these check boxes allow for individual services to have their monitoring disabled without removing them from the host, they will however be removed from Nagios at the next update. For the purposes of setting the primary contact of the host it is only necessary to put the user id of the correct person/account into the 'dns_contact' field under the "DNS" section (currently immediately below the "Services" section). Hosts are automatically grouped according to the "Groups" field in the "Support" section, so please ensure that this is set correctly. Each of the services available in inventory are set up to be fairly generic as there is currently no room to configure them per machine. The full list of services as well as documentation on each service is available at the [[https://nagios.cscf.uwaterloo.ca/inv-services-tools/][Inventory Services Tools]] page. <!-- Service documentation moved to https://nagios.cscf.uwaterloo.ca/inv-services-tools/ The services that can be monitored are: <dl> <dt> Alive (ping) <dd> Doesn't actually register a service with Nagios, just the host. This means that if any other service has been added to the machine, this service is entirely redundant. This can also be used to sync the basic machine information with !NagiosQL so that more complex services may be added (through !NagiosQL). <dt> APC Environmental Unit <dd> Adds several services which check the status of a APC environmental unit using SNMP <dt> APC UPS Monitoring <dd> Monitors the health of an APC UPS using SNMP <dd> apc_ups_runtime_remain: Will warn after runtime remaining drops below 15 minutes, will critical if below 10 (See [[https://cs.uwaterloo.ca/cscf/infrastructure/capacity-planning/power/][Expectant Minimum UPS Runtimes]]), measured in timeticks (100 timeticks = 1 second) <dd> apc_ups_battery_check: Will return critical if a UPS fails it's battery self-test and need replacing. <dt> Cups Print Server <dd> Monitors a CUPS server, its queues, and print jobs. Uses the check_cups plugin for Nagios <dt> Directory Services <dd> Monitors LDAP using an anonymous connection. <dt> DNS Service <dd> Attempts to lookup =cs.uwaterloo.ca= using the machine as the DNS server. Also performs an SRV lookup of =_ldap._tcp.cs.uwaterloo.ca= using the machine. <dt> Eaton Powerware UPS Monitoring <dd> Monitors the health of an Eaton Powerware 91xx UPS. <dd> eaton_ups_battery_status: Monitors the current status of the internal batteries (will critical when failed (3)) �verbatim0� <dt> File Transfer Protocol <dd> Monitors the availability of a FTP service on the default port of the machine <dt> GEIST PDU Monitoring <dd> Monitors the health of a GEIST power distribution unit <dt> Internet Message Access Protocol (IMAP) <dd> Monitors an IMAP server using the check_imap plugin <dt> LDAP - CSCF <dd> Uses the 'check_ldap' plug-in to perform an anonymous query to the server in the CSCF domain <dt> LDAP - CS-GENERAL <dd> Uses the 'check_ldap' plugin to perform an anonymous query to the server in the CS-GENERAL domain <dt> LDAP - CS-TEACHING <dd> Uses the 'check_ldap' plugin to perform an anonymous query to the server in the CS-TEACHING domain <dt> Liebert HVAC Monitoring <dd> Monitors the temperature and humidity as measured by the unit <dt> !MySQL <dd> Monitors the !MySQL daemon through the Nagios check_mysql plug-in. This connects to mysql on its default port (3306) with a standardized mysql user and password, then runs a status query.The user, password and (for future use) database are stored on =nagios.cscf= in =/etc/mysql/.cnf.= This file must have only read permission for user 'nagios' and no permission for others. Before enabling the !MySQL service check, ensure that the above mysql user and database have been created. See ST#92114. Updated 2014-11-17 See ST#94249. <dt> Remote Desktop Protocol <dd> Verifies that there is a service running on TCP port 3389, as there is no simple method to actually verify that it is RDP that is running. <dt> Room Environment (Old Websensor) <dd> Monitors the temperature, humidity and illumination of the room, as measured by the unit <dt> Samba <dd> Check that a service is running on port 139 (tcp) <dt> Secure Shell <dd> Opens a connection with the !SSH daemon on port 22. It does not actually log in though. <dt> Simple Mail Transfer Protocol <dd> Monitors the health of an SMTP service, uses the check_smtp plugin <dt> System Health (collectd reqd) <dd> Monitors the overall health of the machine through the use of collectd, as such collectd must be installed (see below), currently monitored are system load averages (1, 5 and 15 minute averages), with warnings at 8, 7 and 6 and critical states at 9, 8 and 7.5; current users with warning and critical states at 10 and 15 users, respectively; and free disk space on the root (/) file-system, warning and critical states are respectively at 70 and 90 percent ( on a side note examining the command line command will reveal the numbers are 233% and 900%, however this is used disk space as a percentage of free disk space). <dt> Temperature Monitoring (lmSensors) <dd> Monitors TEMP_1 sensor information, warning at 37C, critical at 45C. Requires lm-sensors to be installed. <dt> Virtual Network Computing <dd> Uses check_tcp to ensure that a service is running on port 5901 <dt> Web Server <dd> Requests the main page from the server and verifies that it get a 200 OK response code <dt> Web Server (SSL) <dd> Requests the main page from the server using HTTPS/SSL over port 443 and verifies that it gets a 200 OK response code </dl> --> ---+++ Removing a host via Inventory To remove a host through inventory it is only necessary to remove or uncheck the monitoring check box for all of its services from the services list. At the next update the host will be disabled in !NagiosQL and removed from the Nagios configuration. It is only disabled and not removed, if however the machine is reactivated through !NagiosQL and does not have any services from inventory on it the sync system will re-deactivate it. It should be noted that the host will no longer be kept up to date by the inventory sync system (although this is a planned feature, that all hosts would be updated from inventory) Removing a host from Nagios via inventory does not immediately delete the hosts history, as this is stored in the event log it will be deleted as the logs are rotated (however the same schedule applies to hosts that still exist), if the host is re-added to Nagios it will be logically re-attached to any old history remaining (as long as it has not been renamed) (it is really only an appearance as Nagios simply searches the log files for occurrences of the machine's name). Similarly services can be removed and re-added without alteration to their event history as it is stored in the same set of rotated log files. ---+++ Using !NagiosQL The suggested method for adding hosts to !NagiosQL is to use inventory and add any appropriate services, then use !NagiosQL to add any remaining services. If a none of the services in inventory apply to the machine in question, adding the service 'Alive (ping)' will cause the host to be added to the system, without any services, at which point services can be configured on the machine (through !NagiosQL) and the machine will be kept up to date with inventory through the sync system. !NagiosQL is relatively straightforward and there is a fair amount of explanation for the various options within the software itself. As such if you are looking for information on a specific option it is suggested that you view the option documentation within the !NagiosQL web interface. !NagiosQL is laid out in a somewhat logical manner, although the menu structure can be a little confusing at first. The Supervision section contains items relating to hosts and services; the Alerting section deals with contacts and notifications; the Commands section contains just one entry: Definitions, this is where new monitoring and notification commands are added and old ones modified. The Specialties section is for more advanced configuration of hosts and services (it exposes the ability of having certain hosts or services depend on others, allowing for complex relationships), the tools section deals with daemon and CGI config, as well as getting information from !NagiosQL to Nagios. Finally the Administration section exists for the purpose of maintaining !NagiosQL itself, and has no direct effect on Nagios. Host management in !NagiosQL is done through two configuration panels: Supervision → Hosts and Supervision → Host templates. These correspond directly with their equivalents in Nagios. Hosts can be added or removed as one would expect, services can be attached to them and contacts can be selected. There is also an "Active" check box, when this is checked the host will be written to the Nagios config files, when it is unchecked it will be removed from them; this behavior also applies to most other configuration objects, on a slightly different note, when Inventory "removes" a host, that is it no longer has services listed for it, then the sync script will set the host to inactive rather than delete it. Services are similarly managed through Supervision → Services and Supervision → Service templates. These once again are directly related to the services and servicetemplates in Nagios, and behave as such. It is important to note that the records (of any type) beginning with 'inv-' are specific to the inventory sync system, this pattern is used as an internal identifier. For contacts this means that they were added by the sync system, but for host-groups this means that they propagate to inventory. The services listed in inventory are actually host-groups in Nagios, they are the host-groups beginning with 'inv-' the name seen in inventory is the "description" field. Hosts that have these services assigned to them through the inventory system are added to these host-groups. To allow Nagios to actually monitor what these "services" represent there are a set of Nagios services that have been assigned to the correct host-groups. This does not necessarily occur in a one-to-one relationship, for instance the System Health service in inventory is the inv-health host-group which (as of August 23, 2013) contains 3 services. Similarly a service could be added to more than one host-group if it were appropriate. On a related note the sync system matches the Inventory services with the Nagios host-groups by a textual comparison, thus if you change the description of an 'inv-' host-group it will inventory will be updated to offer the new option and the old one will be removed, however existing hosts will not be migrated and will no longer have that service monitored. This is because the sync script has no way of actually knowing what the service used to be called or if it is actually a new service and the missing one was deleted. See '[[#Rename_Inventory_Service][Rename Inventory Service]]' above. ---++ User Management ---+++ Nagios CGI Interface Nagios now uses CAS for user authentication. In order to administer users for the Nagios CGI Interface it is necessary to have root privileges at a shell on =nagios.cscf.uwaterloo.ca= ---++++ Adding a User There are two parts to adding a user in the Nagios CGI, the first is to add a user and the second is to give the user privileges. Both require shell access to =nagios-202.cscf.uwaterloo.ca= Once logged in adding a user is done by editing the file =/etc/nagios3/htpasswd.groups= to add the username to the list. The second part is achieved by editing the file =/etc/nagios3/cgi.cfg.= The following lines contain the relevant configuration values. Line numbers are correct as of April 24<sup>th</sup>, 2014. <verbatim>132: authorized_for_system_information=nagiosadmin,cscf-op,cscf-adm 144: authorized_for_configuration_information=nagiosadmin,cscf-op,cscf-adm 157: authorized_for_system_commands=nagiosadmin,cscf-op,cscf-adm 170: authorized_for_all_services=nagiosadmin,cscf-op,cscf-adm 171: authorized_for_all_hosts=nagiosadmin,cscf-op,cscf-adm 184: authorized_for_all_service_commands=nagiosadmin,cscf-op,cscf-adm 185: authorized_for_all_host_commands=nagiosadmin,cscf-op,cscf-adm </verbatim> ---++++ Changing a User's Password The user's password can be changed through the normal methods for changing the [[http://watiam.uwaterloo.ca/search/][WatIAM]] password. ---++++ Removing a User Removing a user is achieved by removing the user from the list in =/etc/nagios3/htpasswd.groups= Also the user should have all privileges revoked. This can be achieved through the following two commands <verbatim>sed -i 's/<username>//g' /etc/nagios3/cgi.cfg sed -i 's/,,/,/g' /etc/nagios3/cgi.cfg </verbatim> The first command removes all instances of the username from the file, and the second removes duplicate commas that might be left (if the user wasn't at the end of all the lists). ---+++ !NagiosQL Web Administration Interface NOTE: !NagiosQL uses CAS for user authentication. The list of authorized users is managed through the !NagiosQL web interface. Any CAS authenticated user who is not authorized by !NagiosQL will be presented with the !NagiosQL login page, where cscf-op and cscf-adm credentials will work. In !NagiosQL all user administration is done through the user administration panel. To access this panel log into !NagiosQL as an administrator (cscf-op or cscf-adm should work just fine) and using the menu on the left, navigate to Administration → User admin. ---++++ Adding a User To add a user click add, and fill out the fields in red, the description should either be the real name of the person (for a personal account) or the role that someone using that account would be filling (for a non-personal account). Users are automatically administrators, (well, actually there are just no access restrictions set up, if those were to be set up, then users would have to be added to specific groups to do specific things, that however is a fair amount of work to set up and maintain) if group administration privileges are desired (this will enable the user to add and remove people from groups) then put a check mark in the box labeled "Enable group administration". In order to enable CAS authentication for this user you must check the box "Web server authentication" (if you do not check this box you will have to login with !NagiosQL credentials). Save the user for changes to be applied. Regardless of the setting of "Web server authentication", the user needs to be assigned a password, if the user will be authenticated with CAS ("web server authentication" is checked) then there is no need to set a password that can be remembered, it can be entirely random. Also because of the mechanisms used by !NagiosQL to authenticate users, if someone were to get ahold of a username and password (even if it is supposed to be authenticated through CAS) they can log in with it. Please be sure to use a strong password. To generate a password on a Linux machine =dd bs=4K count=1 if=/dev/urandom | sha256sum= can be used. It may also work on Windows (via CygWin) and on Mac OS X. There should be no practical way to reproduce a password generated in this fashion. ---++++ Changing a User's Password ---+++++ !NagiosQL Authenticated Users For users who are authenticated through !NagiosQL (not CAS) use the procedure below. The cscf-op and cscf-adm accounts are of this type. A user can change their own password by visiting Administration → New Password in the web interface and then filling out the details and clicking Save. Another user's password can be reset by visiting Amdinistration → User admin and clicking the wrench and screwdriver icon in the rightmost column of the user's row in the table and then typing a new password into the appropriate fields and clicking Save. (All other attributes can be changed in the same manner, simply edit the appropriate fields.) ---+++++ For CAS Authenticated Users These users' passwords can be changed through the standard mechanisms, such as [[https://watiam.uwaterloo.ca/][WatIAM]]. ---++++ Removing A User Visit Administration → User admin and click the garbage can icon on the far right in the row corresponding to the user. This works for both users athenticated by CAS and those authenticated locally. ---+++ !PHPMyAdmin and !MySQL ---++++ Adding a User Navigate to Home → Privileges, click "Add a new user" and fill in a user name and password. Scroll down to Global privileges and click check all, then select "Create user". Click the "Edit privileges" link in the "Action" (last) column for the user that was just created, scroll down to "Change Login Information / Copy User" and change the host to localhost, then click the "Go" button immediately below that section. ---++++ Changing a User's Password This must be performed for each entry that the user has (usually two), first click "Edit privileges" in the far right column, scroll down to the third section, "Change Password" and enter a new password, then click "Go" (the one immediately below that section). ---++++ Removing a User Navigate to Home → Privileges select the two rows pertaining to the user, scroll down to the section labeled "Remove selected users" and click "Go". The user has now been completely removed from the database server. ---++ What to do if Nagios is broken? The question remains, what happens when Nagios breaks and Dennis is not here to fix it? Not to worry, below you'll find some documentation which describes how you can try and troubleshoot Nagios problems to find the root cause of a downtime/broken component. ---+++ Overview It's important to understand the basic flow of Nagios data and how the various systems that make up Nagios interact. IMAGE HERE INSERT IT HERE DONT FORGET I'll summarize the idea that the image is trying to portray. Nagios3 is the base of the whole Nagios system and is the Nagios monitoring system itself. NagiosQL feeds Nagios3 data by creating the configuration files that Nagios3 uses. NagiosQL is a web interface that allows easy modification of the Nagios3 configuration files, and avoids the user having to manually go into the Nagios3 config files and edit them. NagiosQL also provides other features such as disabling the status of hosts, but for the purpose of this chart, NagiosQL provides Nagios3 the configuration files it requires to enable monitoring of devices. The NagiosAPI sends data to NagiosQL and modifies the NagiosQL database based on the API requests it receives. The Nagios API is simply a mechanism to interact with NagiosQL programmatically. Inventory works in both directions with the Nagios API in order to send data to Nagios, but the Nagios API also works in reverse and gets JSON data from the Inventory system to get details on a machine so that Nagios has the correct data filled in. ---+++ First steps <span style="background-color: transparent;">You may be asking; what is the first thing that you should do when something breaks with Nagios? Your best bet is going to be to take a look at the Nagios3 service. </span> ---++++ Checking the Nagios3 Service <span style="background-color: transparent;">When connected to the nagios system, you should check on the Nagios3 service. Nagios3 restarts all the time, so there is no harm in simply attempting a restart on it via a "sudo service nagios3 restart" (replace "restart" with a "start" if the service is not running). </span> <span style="background-color: transparent;">If the Nagios system has problems starting during this process then it would indicate these problems clearly when you restart the service. </span> ---++++ Looking into the Nagios3 log files You can quickly take a look in the Nagios3 log files to see if there may be anything the restart did not show. The logs you find are location in /var/log/nagios3/. The two files to look at are the <em>nagios.log</em> and _livestatus.log_ files. ---++ Additional Notes ---+++ Linking To The Nagios !CGIs Linking to the Nagios !CGIs is fairly straight forward, the !CGIs make use of frames which complicates things a little bit. The format to link to a specific !CGI page is as follows: <verbatim><URL of Nagios Installation>?corewindow=<encoded relative url of desired page></verbatim> * The first URL is the one that is normally used to access Nagios, for example =https://nagios-202.cscf.uwaterloo.ca/nagios3/= * The second URL is can be obtained by visiting the desired page, right-clicking on the main part and choosing "Show only this frame", the URL is now in the address bar, the domain part is not needed, for example the host overview page is: =/cgi-bin/nagios3/status.cgi?hostgroup=all&style=hostdetail= * The next step is to encode this URL, this can be done with any of the URL encoders on the internet, I've had luck with http://meyerweb.com/eric/tools/dencoder/ * In this example it becomes: =%2Fcgi-bin%2Fnagios3%2Fstatus.cgi%3Fhostgroup%3Dall%26style%3Dhostdetail= * Putting it all together gives: =https://nagios-202.cscf.uwaterloo.ca/nagios3/?corewindow=%2Fcgi-bin%2Fnagios3%2Fstatus.cgi%3Fhostgroup%3Dall%26style%3Dhostdetail= ---+++ Managing Notices on the Homepage The Nagios Monitoring system's homepage (https://nagios.cscf.uwaterloo.ca/) reads notices from a file and displays them in a notices section at the top of the page. The format for the file is as follows: <verbatim>![Heading] Body of notice</verbatim> If the exclamation point is present then the notice will not appear on the page, if it is not present then the part in square brackets becomes the title of the notice and the rest becomes the body. Note that after the first set of square brackets any characters are legal and will simply be sent to the browser, the only exception to this is a newline character, as this is used as a deliminator between notices. The file is located on nagios.cscf.uwaterloo.ca at /var/www/private/notices It should also be noted that the heading used for the title is a heading 3 (<h3></h3>) ---+++ Restarting the Nagios Daemon If the Nagios daemon in not working for some reason (or has failed to start because of configuration issues which have since been resolved) it can be restarted with =sudo service nagios3 restart= or started with =sudo service nagios3 start= ---+++ API Program Technical Overview The script is kept in a git repository which is checked out into =/home/nagios-script/nagios-scripts/= on nagios.cscf. Within this directory are the directories =bin= and =etc= which contain the program and its configuration respectively. Inside another directory, =var-www=, is a file, =api.php=, which is the entry point for the API. Inside of the etc directory there are a bunch of configuration files. As much of the code is shared between several scripts and web frontends, their configuration is also stored here. The main script reads api.conf, which references three files: passwords.conf, main.conf, and api-statements.conf. passwords.conf contains the database information and other passswords, it should be readable only by owner and group, which should be nagios-script and nagios-data respectively. The file main.conf contains the general configuration, and is multi-script wide, it should contain sane defaults. The SQL statements are all stored in api-statements.conf; it should be noted that there are no default statements, if the program goes to use a statement and it is not provided from the configuration, it will encounter an error. Note that there should never be a need to edit the statements. If you wish to change an option for only this script, you can add it to the end of the api.conf file, after the included files, as this will cause the value to be overwritten in program when the config is loaded. ---++ Development Notes The [[http://www.nagios.org][Nagios]] system is set up on a [[http://developer-blog.cloudbees.com/2013/02/blue-green-deployments-continuous.html][Blue/Green deployment]] system. The current production server is [[https://nagios-204.cscf.uwaterloo.ca][nagios-204.cscf.uwaterloo.ca]], and the development system (previously production) is [[https://nagios-202.cscf.uwaterloo.ca][nagios-202.cscf.uwaterloo.ca]]. The general service address of this system is [[https://nagios.cscf.uwaterloo.ca][nagios.cscf.uwaterloo.ca]]. Both machines are currently LXC virtual containers hosted on [[https://cs.uwaterloo.ca/cscf/internal/inventory/web/inventory/index.php?r=inventory/update&id=14519][asgard.cs.uwaterloo.ca]]. Configuration management is handled by !NagiosQL. The system is integrated and synchronized with [[https://cs.uwaterloo.ca/cscf/internal/inventory/web/inventory/][inventory]] upon every inventory update of records that have services monitored. One developer's suggested workflow is detailed [[https://cs.uwaterloo.ca/cscf/internal/request_debug/UpdateRequest?100903][in this work item]]. ---+++ Architecture of Collection of Scripts The system that provides the API service for synchronizing inventory with the Nagios monitoring system and providing monitoring data to inventory is written in PHP using object oriented programming techniques and PHP database objects for database integration. The main high level logic is contained within the API, Command and its subclasses, Get, Fetch, Add, !UpdateHost, !UpdateService, and Delete classes stored in reasonably named files. It uses the Nagios and !LiveStatus classes, which are stored within the nagios.php and !LiveStatus.php files, to access (both read and write) to the databse and !MK !LiveStatus. These two classes contain mostly higher level operations, while low level code (such as initialization of PDO objects and statement management) is contained within the Database class (located in database.php). Once the controller classes have the data from the database it either sends to Inventory (or whatever called it), or uses it to update its own data in concert with provided external data (in the API call) and data collected from Inventory via the =json= action. Data is sent back to the database using the Nagios clas. The script makes use of an external configuration file which determines the exact behaviour of the script as well as defining the SQL statements and their parameters that are used to accomplish the synchronization. ---+++ Service Status (Inventory) 1 A service is manually added to a machine in Inventory * Nagios immediately records the service as being OK 1 Nagios begins monitoring the service * At this point when Nagios finds an error with the service it stores the error and sends emails. 1 On the next display/refresh of the host page in Inventory, the updated status will be shown. ---+++ !MK Livestatus [[https://mathias-kettner.de/checkmk_livestatus.html][MK Livestatus]] is a module for Nagios which provides access to many of the internal structures within Nagios. It uses a language known as [[https://mathias-kettner.de/checkmk_livestatus.html#H1:LQL%20-%20The%20Livestatus%20Query%20Language][Livestatus Query Language]] (LQL) which is somewhat modelled after SQL, to provide access to the Nagios state data. This module enables access to the entire configuration data of Nagios, including host, contact, service, user, and hostgroups amoung others. The module is accessed through a unix socket, which is located at =/var/lib/nagios3/rw/live=. There is a PHP wrapper script which can also be used which is at =res/live.php= in the git repository for this project. There are also !MKLSDriver and !MKLSStatement which together provide a !PDO like interface to !MKLiveStatus (I would have written a !PDO driver but it would have taken too long). ---+++ Environment Variables It should be noted that when Nagios runs a check command it does not set any environment variables, this means that commands do not have access to many things which are often taken for granted (such as the home directory). Nagios also seems intent on making it very difficult to set them in the command entry. If environment variables are needed then it is likely needed to write a wrapper script which prepares the environment.(Source: http://readlist.com/lists/lists.sourceforge.net/nagios-users/2/14605.html). ---++++ Affected Commands Commands which are known to be affected include: * check_mysql (and similar) * Looks for =.my.cnf= in home directory * Can use =/etc/mysql/.cnf= instead ---+++ Nagios In order to store the documentation of the services in the NagiosQL database (needed for the new web page) the notes column of the tbl_hostgroups table needed its type changed from VARCHAR(255) to LONGTEXT, which cantain up to 4 GiB of text data. ---++ Nagios API An API is being implemented for Nagios. Documentation is available at NagiosAPI. ---+++ Features * Automatic updating of Nagios from !NagiosQL * Interface with Inventory for host definitions and provided services * CAS Authentication ---+++ TODO ---++++ Nagios * Integrate with CAS - Done * SSL to allow (require) https - Done * get it backed up (talk to Guoxiang - gxshen) - Done * Show statuses of monitored services in inventory - Done * Only restart Nagios if configuration has actually changed. - Done * Setup !NagVis * Two way sync with inventory * Sync of inv. hostgroups as services, support for modifications in !NagiosQL * Allow hosts linked to inventory to be updated automatically from inventory without destroying modifications in !NagiosQL * Support for recognizing hosts that have been added to !NagiosQL and matching with inventory, allowing full sync. * Not likely to happen, add 'Alive (ping)' to the host in inventory, then configure in !NagiosQL * A basic web front-end that allows for renaming of Inventory "services" and properly updates all records in Inventory. - Done * Batch management of services in inventory (all systems with service X listed switch to/add/remove service Y) ??? <--- Would be nice. ---++++ Inventory * Intelligent hiding of services section when not applicable (i.e. not a computer) -- Main.DennisBellinger - 2015-08-28
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
ods
Inv-Modifications-mockup.ods
r1
manage
23.8 K
2013-08-06 - 16:47
DennisBellinger
An interface mockup of the addition to inventory
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r65
<
r64
<
r63
<
r62
<
r61
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r62 - 2016-04-29
-
JustinVisser
CF
Information in this area is meant for use by CSCF staff and is not official documentation, but anybody who is interested is welcome to use it if they find it useful.
CF Web
CF Web Home
Changes
Index
Search
Administration
Communication
Email
Hardware
HelpDeskGuide
Infrastructure
InternalProjects
Linux
MachineNotes
Macintosh
Management
Networking
Printing
Research
Security
Software
Solaris
StaffStuff
TaskGroups
TermGoals
Teaching
UserSupport
Vendors
Windows
XHier
Other Webs
CSEveryBody
Main
Sandbox
TWiki
UW
My links
People
CERAS
WatForm
Tetherless lab
Ubuntu Main.HowTo
eDocs
RGG NE notes
RGG
CS infrastructure
Grad images
Edit
Copyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback