Nagios Core and Nagios XI integration for Splunk On-Call 🔗
The Splunk On-Call and Nagios integration supports both Nagios Core and Nagios XI. Integrate Nagios and Splunk On-Call to monitor and alert on your entire infrastructure, whether it be cloud, virtual, or physical IT environments.
Nagios periodically checks on critical parameters of application, network, and server resources. It can monitor—for example—memory usage, disk usage, microprocessor load, log files, and the quantity of currently running processes. Nagios can also monitor services such as Simple Mail Transfer Protocol (SMTP), Post Office Protocol 3 (POP3), Hypertext Transfer Protocol (HTTP) and other common network protocols. Nagios can provide the data and feed alerts into the Splunk On-Call timeline where issues can be responded to.
The Splunk On-Call and Nagios integration is configurable with a simple self-generated service API key in the Splunk On-Call integrations settings. Based on the parameters and thresholds defined, Nagios can send alerts if critical levels are reached. These notifications can be sent to the appropriate teams in Splunk On-Call through multiple channels including live call routing, native chat, phone, email, and SMS.
With the Nagios integration for Splunk On- Call you can:
Easily configure Nagios by generating a service API key within Splunk On-Call
Configure Nagios to send alerts straight into Splunk On-Call. If a service goes critical, an alert notifies the appropriate teams
Send heartbeat info to Splunk On-Call to help determine whether your plugin is working correctly even if alerts are not being generated by Nagios. Using the Nagios integration, you can collect info and generate alerts in Splunk On-Call, even if your Nagios server is down
Use Splunk On-Call to run acknoledge back command poll commands on your Nagios server. Commands issued at Splunk On-Call are relayed to your Nagios monitor.
Nagios Core integration guide 🔗
Requirements 🔗
Nagios versions supported: Nagios 4.x and lower
Splunk On-Call versions required: Starter, Growth, or Enterprise
For Nagios environments behind a firewall, see Send Nagios alerts through email
Turn on the integration and generate an API key 🔗
Go to Settings then Alert Behavior then Integrations then Nagios and select Enable Integration to generate your configuration values for Nagios. You use the API key that displays after turning on the integration in a later configuration step.
Configure the Nagios plugin 🔗
Splunk On-Call alert processing is implemented as a Nagios contact that is added to a contact group (often admins, but that depends on your individual configuration). The contact mechanism for the Splunk On-Call contact is a simple shell script that spools the alert details to a file on disk. When an alert is fired, and Nagios invokes the contact script, the details are placed in /var/nagios and return to Nagios. There is a long-running bash script that monitors /var/nagios for new files and posts the data in those files to Splunk On-Call over HTTPS. This forwarding script is monitored by Nagios itself, and if it stops for any reason, the Nagios service check attempts to restart it. In the event that this forwarding script is unable to successfully send alerts to Splunk On-Call for a time, it falls back to sending an email version of the alert. You can configure the target address for the fallback alert.
If you prefer not to install the plugin, see Send Nagios alerts through email.
The plugin files are installed to /opt/victorops/nagios_plugin. There is a Nagios configuration file called victorops.cfg in /opt/victorops/nagios_plugin/nagios_conf that contains all configurations for the plugin.
Install the Nagios plugin 🔗
Depending on your system you might need to use sudo with these commands.
Run the following command:
wget https://github.com/victorops/monitoring_tool_releases/releases/download/victorops-nagios-1.4.20/victorops-nagios_1.4.20_all.deb
Run the following command:
dpkg -i <path_to_file>
If you don’t want to use dpkg you can also run the following:
sudo apt install <path_to_file>
Run the following command:
wget https://github.com/victorops/monitoring_tool_releases/releases/download/victorops-nagios-1.4.20/victorops-nagios-1.4.20-1.noarch.rpm
Run the following command
rpm -i <path_to_file>
If you install from the DEB or RPM packages, the installer puts the plugin files in /opt/victorops/nagios_plugin and creates the logging and alert directories.
Modify Nagios configuration file 🔗
After installation, you need to move the victorops.cfg file to your Nagios configuration directory, and modify both the nagios.cfg and victorops.cfg files.
Sending alerts to Splunk On-Call is done via a shell script that requires the Nagios/Icinga environment macros. To enable this Nagios functionality, open /etc/nagios/nagios.cfg (or icinga.cfg, actual path may vary) and find the enable_environment_macros directive. Make sure this is set to: enable_environment_macros=1
. If this directive does not exist, add it to the config file.
Still within the nagios.cfg file, add this line which tells Nagios where to find your Splunk On-Call Configuration file using your unique file path. This line read similar to: cfg_file=/usr/local/nagios/etc/victorops.cfg
Modify your VictorOps configuration file 🔗
This file defines where the Nagios alert routes to (more info in Routing Incidents section below), amongst other variables.
Move the file to your Nagios configuration directory using
mv /opt/victorops/nagios_plugin/nagios_conf/victorops.cfg /usr/local/nagios/etc
Open up the victorops.cfg file and configure the following values as both the VictorOps_Contact_Settings (~line 20) contact and VictorOps_Service_Settings (~line 40) service object definitions:
Required configuration settings:
Setting
Location
Description
_VO_ORGANIZATION_ID
Line 24 in
VictorOps_Contact_Settings
and line 44 inVictorOps_Service_Settings
approximatelyThe slug for your Splunk On-call organization. To find your slug, go to your timeline in Splunk On-Call and look at the URL. Your
_VO_ORGANIZATION_ID
is the string that follows/client/
._VO_ORGANIZATION_KEY
Line 25 and 26 in
VictorOps_Contact_Settings
approximatelyThe API key that was created when you turned on the integration.
Line 51 in
VictorOps_Service_Settings
approximatelyThis value is in the
VictorOps_Service_Settings
service object definition. It is the name of your Nagios host, as defined to Nagios. It turns on the heartbeat and command check services.Optional configuration settings:
Setting
Location
Description
_VO_MONITOR_NAME
Line 24 in
VictorOps_Contact_Settings
and line 46 inVictorOps_Service_Settings
approximatelyIdentifies the Nagios instance to Splunk On-Call and might be blank. If you are using multiple Nagios servers in your architecture, distinguish them with unique IDs in this field.
_VO_CONTACTEMAIL
Line 32 in
VictorOps_Contact_Settings
approximatelyA backup email address to send alerts to. If the plugin is unable to relay alerts to Splunk On-Call, an alert email is sent to this address. Include an email-SMS gateway in this list. You can configure multiple addresses by separating them with spaces and enclosing the whole thing in single quotes, for example:
'me@mydomain.com you@mydomain.com him@mydomain.com 3035551212@vtext.com'
_VO_MAX_SEND_DELAY
Line 36 in
VictorOps_Contact_Settings
approximatelyThe maximum amount of time (in seconds) that alerts are allowed to remain in the queue before the alert is sent to the contact email.
Configure additional services 🔗
These 4 services appear on the Icinga server in the Icinga dashboard. If you want to turn on alerts for these service, edit their service definitions in victorops.cfg.
Splunk On-Call alert forwarder 🔗
This is a process check for the long-running script. If this service goes critical, it create an email alert (since normal alert forwarding can’t work when this service is down).
Splunk On-Call heartbeat 🔗
The victorops.cfg file defines a service to send heartbeat info to Splunk On-Call. This service is turned on by default. This service helps you to determine whether your plugin is working correctly, even if there are no alerts generated by Nagios.
Splunk On-Call command poll (acknowledge back) 🔗
This service polls Splunk On-Call for commands to run on your Nagios server. This service is turned off by default. The purpose is to allow commands issued at Splunk On-Call to be relayed to your Nagios monitor. At this time, the only commands allowed by this service are host and service acknowledgements. See Ack-Back for Nagios.
Splunk On-Call status resync (manual/auto) 🔗
This service can send a complete Nagios status to Splunk On-Call. It can be used in the event that Splunk On-Call gets out of sync with your Nagios system. This might happen, for example, if you had notifications disabled in Nagios for a time. It requires cURL be installed on the Nagios host. There are 2 options, manual and auto. The manual option can only be invoked manually in the Nagios console. The auto option runs automatically, but is turned off and commented out by default. At this time, this is a preview feature.
Verify the installation 🔗
After installing and configuring the plugin, you can verify functionality by using Nagios to send a custom notification for some service you have defined. The alert should be received by Splunk On-Call and appear in your company timeline.
The contact script and alert forwarder write logs in /var/log/victorops. If the plugin does not seem to be working correctly, check these logs for errors.
Routing incidents 🔗
With the Nagios plugin for Splunk On-Call, the routing key sent to Splunk On-Call is the name of whatever contact group contains the Splunk On-Call contact. If you want Nagios to be able to route various incidents to multiple teams in Splunk On-Call, you need to create a unique contact, and unique contact group (with the 1 contact as the sole member) for each routing key you want to use in SplunkOn-Call. You can set up routing keys in Splunk On-Call under Settings then Alert Behavior then Routing Keys.
In the following example, assume there are 3 teams in Splunk On-Call that you want to receive incidents from Nagios. The teams are DevOps, SRE, and Database.
Define a contact for each team, using the
VictorOps_Contactsettings
setting defined in victorops.cfg.Devops contact:
define contact{ use VictorOps_Contact name VictorOps_devops contact_name VictorOps_devops alias VictorOps_devops }
SRE contact:
define contact{ use VictorOps_Contact name VictorOps_sre contact_name VictorOps_sre alias VictorOps_sre }
Database contact:
define contact{ use VictorOps_Contact name VictorOps_database contact_name VictorOps_database alias VictorOps_database }
Define a unique contact group for each of the contacts defined above and add those contacts as the sole member, respectively. The value used in the alert to Splunk On-Call is derived from the
contactgroup_name
, so make sure that these names match the values you want to use in Splunk On-Call or change the routing_keys in Splunk On-Call to match the names you define here.Devops contact group:
define contactgroup{ contactgroup_name devops ## This is the routing_key value of the alert to Splunk On-Call alias VictorOps DevOps contact group members VictorOps_devops }
SRE contact group:
define contactgroup{ contactgroup_name sre ## This is the routing_key value of the alert to Splunk On-Call alias VictorOps SRE contact group members VictorOps_sre }
Database contact group:
define contactgroup{ contactgroup_name database ## This is the routing_key value of the alert to Splunk On-Call alias VictorOps Database contact group members VictorOps_database }
Add the contact groups to their appropriate check commands so they arrive with the correct routing key, which is the contactgroup_name. You can add the VictorOps contact to as many contact_groups as you like and you can also add the VictorOps contact to specific services.
Send Nagios alerts through email 🔗
If your Nagios environment is restricted behind a firewall or if you don’t want to install the plugin on your Nagios hosts, you can still send Nagios alerts to Splunk On-Call through email. Alerts sent throught email show on your timeline without the extended functionality provided by the plugin.
To send Nagios alerts to Splunk On-Call through email, create a Nagios contact using the following sample configuration and add that contact to 1 of the Nagios contact groups that normally receives alerts from your system.
In the sample configuration given, the organization ID and organization key allow Splunk On-Call to validate the alerts and route them to your timeline. You can fined the values under the Integrations section of the Splunk On-Call web app. The mail command in the configuration formats the alert details into the alert email appropriately.
##——————————————————————————————
## These Nagios contact and service definitions are used to pass configurable values to the email command.
##
## Contact settings:
## _VO_ORGANIZATION_ID
## _VO_ORGANIZATION_KEY
## These identify your alerts to VictorOps. The values for these fields are assigned to you by VictorOps.
## _VO_MONITOR_NAME
## VictorOps supports multiple Nagios instances per organization. This configuration value identifies the instance to
## VictorOps. It can be set to something you choose (such as the name of this Nagios host).
##
##——————————————————————————————
define contact{
contact_name VictorOps_Email
## Configure these values as described above
_VO_ORGANIZATION_ID xxxxxxxxxxxxx
_VO_ORGANIZATION_KEY xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
_VO_MONITOR_NAME
alias VictorOps_Email
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,r
service_notification_commands notify-victorops-by-email
host_notification_commands notify-victorops-by-email
register 1
_VO_ALERT_DOMAIN alert.victorops.com
}
define command{
command_name notify-victorops-by-email
command_line /usr/bin/printf "%b" "\nVO_ORGANIZATION_ID=$_CONTACTVO_ORGANIZATION_ID$\nVO_ ORGANIZATION_KEY=$_CONTACTVO_ORGANIZATION_KEY$\n_CONTACTVO_ORGANIZATION_KEY=$_CONTACTVO_ORGANIZATION_KEY$\nVO_MONITOR_NAME=$_CONTACTVO_MONITOR_NAME$\n_CONTACTVO_MONITOR_NAME=$_CONTACTVO_MONITOR_NAME$\nTIMET=$TIMET$\nDATE=$DATE$\nTIME=$TIME$\nHOSTNAME=$HOSTNAME$\nHOSTALIAS=$HOSTALIAS$\nHOSTDISPLAYNAME=$HOSTDISPLAYNAME$\nHOSTSTATE=$HOSTSTATE$\nLASTHOSTSTATECHANGE=$LASTHOSTSTATECHANGE$\nHOSTOUTPUT=$HOSTOUTPUT$\nHOSTPERFDATA=$HOSTPERFDATA$\nHOSTGROUPALIAS=$HOSTGROUPALIAS$\nHOSTGROUPNAME=$HOSTGROUPNAME$\nHOSTGROUPMEMBERS=$HOSTGROUPMEMBERS$\nHOSTGROUPNAMES=$HOSTGROUPNAMES$\nSERVICEDESC=$SERVICEDESC$\nSERVICEDISPLAYNAME=$SERVICEDISPLAYNAME$\nSERVICESTATE=$SERVICESTATE$\nLASTSERVICESTATECHANGE=$LASTSERVICESTATECHANGE$\nSERVICEOUTPUT=$SERVICEOUTPUT$\nSERVICECHECKCOMMAND=$SERVICECHECKCOMMAND$\nCONTACTGROUPNAME=$CONTACTGROUPNAME$\nNOTIFICATIONTYPE=$NOTIFICATIONTYPE$\nNOTIFICATIONAUTHOR=$NOTIFICATIONAUTHOR$\nNOTIFICATIONCOMMENT=$NOTIFICATIONCOMMENT$\n" | /usr/bin/mail -s "$_CONTACTVO_ORGANIZATION_ID$:$_CONTACTVO_ORGANIZATION_KEY$:$_CONTACTVO_MONITOR_NAME$" $_CONTACTVO_ORGANIZATION_KEY$@$_CONTACTVO_ALERT_DOMAIN$
}
Avoid Centos 5 timeouts 🔗
You need to link the timeout command to a directory that is in the path.
Create the symlink.
ln -s /usr/share/doc/bash-3.2/scripts/timeout /usr/bin/timeout
Make it executable:
chmod 755 /usr/share/doc/bash-3.2/scripts/timeout
Nagios XI integration guide 🔗
Requirements 🔗
Nagios versions supported: Nagios XI 5.x and lower
VictorOps version required: Starter, Growth, or Enterprise
Install the Nagios plugin 🔗
Depending on your system you might need to use sudo with these commands.
Run the following command:
wget https://github.com/victorops/monitoring_tool_releases/releases/download/victorops-nagios-1.4.20/victorops-nagios_1.4.20_all.deb
Run the following command:
dpkg -i <path_to_file>
If you don’t want to use dpkg you can also run the following:
sudo apt install <path_to_file>
Run the following command:
wget https://github.com/victorops/monitoring_tool_releases/releases/download/victorops-nagios-1.4.20/victorops-nagios-1.4.20-1.noarch.rpm
Run the following command
rpm -i <path_to_file>
If you install from the DEB or RPM packages, the installer puts the plugin files in /opt/victorops/nagios_plugin and creates the logging and alert directories.
Enable environment macros 🔗
Alerts are sent to Splunk On-Call using a shell script that requires the Nagios environment macros. To enable this Nagios functionality, find the enable_environment_macros
directive in /etc/nagios/nagios.cfg** (actual path might vary) and make sure it is set to 1
. If this directive does not exist, add it to the config file: enable_environment_macros=1
.
Import the configuration 🔗
In the Nagios XI dashboard, select Configure in the top menu.
Select Core Config Manager.
Select Tools then Import Config Files.
Select the config from the file list.
Select Import.
Nagios XI imports Splunk On-Call service check commands as “misc command”. To enable acknowlege back through the Nagios XI UI, you have to change the service to a “check command”. Go to Core Config Manager and bring up the list of commands.
Select the configure icon for the “check_victorops_cmds” command.
In the dialog box, change the command type to “check command” and save.
Send Alerts to Splunk On-Call 🔗
You are now be able to enable active checks on the “VictorOps Command Poll” service through the Nagios XI interface.
If alerts don’t come through, try copying this file: /opt/victorops/nagios_plugin/nagios_conf/victorops.cfg and place it in: /usr/local/nagios/etc/cfgprep/victorops.cfg.
If you receive an error that reads: “Duplicate definition found for contact ‘VictorOps_Contact_Settings’”, remove the cfg_file=/usr/local/nagios/etc/victorops.cfg
line from nagios.cfg.