Application Management

 


Identify data sources

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

Identify data sources

You can index almost any kind of IT or machine data with Splunk. Start with the data that is both easy to get and very useful, like your webserver and application server logs. As your Splunk deployment grows and you move from troubleshooting to a more proactive approach, you'll find more ways to use data in Splunk and more types of data you want to use.

Webserver logs

You can get a lot of information from your webserver logs and these are some of the easiest formats to pull in. Splunk recognizes the Common and Extended Log formats and easily handles Log4j output. Splunk aggregates all your logs from all your webservers, regardless of location, domain, or even different server platforms. Not only can you capture faults and performance problems, but you can also use Splunk to analyze usage and complement your existing web analytics.

Middle-tier and application server logs

By adding middle-tier and application server logs to Splunk, you can track problems across application tiers. When customers report application errors, you can run a single search to see what application servers are causing the error. You can also do cross-tier analysis and correlation. Again, many of these logs are easy to add to Splunk and the value is high.

Operating system data

Operating system data is useful for troubleshooting and even more useful for monitoring. Often you can see server degradation, increased load, or outages before a problem manifests as an application issue.

In a mixed environment, you can index OS metrics from both Windows and *nix operating systems and then search that data using a Splunk server on either platform. The best way to do this is by configuring Splunk forwarding and receiving to gather the remote metrics, and then installing an app for the other operating system on the Splunk server where you will do the searching. See Enable forwarding and receiving in the Admin manual for more information.

OS data for *Nix

The primary way to get operating system data on *nix systems is via scripted inputs. To do this, you write a simple script to collect your data -- usually via a command-line tool, such as vmstat, iostat, netstat, top, etc. -- and then configure Splunk to eat the output generated by the script. You can also use scripted inputs to get data from APIs and other remote data interfaces and message queues.

The *Nix app that ships with Splunk includes scripts for gathering OS metrics on a *nix system. Installing and enabling this app on your *nix boxes is one way to gather systemwide metrics. However, data indexed by the app counts against your license, which can make deploying the app on all your servers prohibitive in large deployments. You can reduce the data volume by modifying the polling time for each script and/or enabling and disabling the different scripts separately. Or you can write your own scripts and deploy them yourself.

See Set up custom (scripted) inputs in the Admin manual for more information.

OS data for Windows

Windows operating systems have a number of specific logs and data sources such as WMI, the Windows Event Log, and the registry.

You can use Splunk on Windows to index data from these Window-specific sources:

Note: Splunk requires privileged access to index many Windows data sources, including WMI, the Event Log, and the registry. This includes both the ability to connect to the box, as well as permissions to read the appropriate data once connected.

On Windows platforms, you can also gather additional data via scripted inputs by enabling text-based scripts in Perl, Python, or any language of your choice that can run on Windows via an intermediary Windows batch (.bat) file.

Databases

The following things are important for databases:

External logs

Databases often log their health and performance data in multiple locations and formats:

Database tables

There is never any point to reproducing all of your database in Splunk, but it can be valuable to eat the information that you want to use for cross-tier troubleshooting or monitoring. This could include logs stored in a table in a database, configuration information from a CMDB, or specific information about your transactions (for example, the exact time an order was finalized in the database).

Database tables can be challenging to retrieve:

Example of tailing database inputs

You can use scripted inputs to do database polling in the case where an application is essentially using a database table rather than writing out log file. Here is one way to do this, taken from the topic What is the most effective ways to poll databases on Splunk Answers.

Use a simple state file to store a unique incrementing value from your database, such as the rowid of the last accessed row. Each time your script runs, it must first load the rowid or other unique value and input it to a WHERE clause that only loads newer (previously unseen) events. As the rows are pulled from the database, you can write (stdout) the event in a textual format which Splunk can index. After all the records have been read, the script must update the state file with the last read rowid value. Next time the script runs the whole process starts over again.

Keeping a counter only works when your table does not contain data that changes or is updated after it's initially written. If the data in your table is dynamic, then you have to use a different approach to write your script. You could use a Database trigger to check for updates and have the result of the trigger write the row's contents to a file. This file can be monitored by Splunk.

Once you have written one such script, you can modify it for different databases or database tables. If you use python as the scripting language, you may be able to reuse the same database interfaces as long as the right drivers are installed.

Network devices

You can monitor network switches and other device sources via syslog or SNMP traps. See Monitor network ports in the Admin manual for how to set up Splunk to receive information from these devices.

Traps

You can use SNMP traps to send significant events to Splunk when you are only interested in faulting or when you want to pull data out of certain tools. Monitoring devices via traps allows you to pinpoint problems during troubleshooting. However, because traps only send fault data, they are not useful for availability and performance monitoring.

Many devices use traps to send fault notification. A device must be configured to send a trap and the trap destination, usually an IP address, must be specified. For information on how to configure a device for SNMP, and how to send traps, refer to the configuration guide for the device.

For information on configuring Splunk to eat SNMP data, see Send SNMP events to Splunk in the Admin manual.

syslog

Network devices such as routers and switches can be configured to send their logs over syslog to UDP:514. You can configure Splunk to listen to the UDP port and eat syslog data. See Best practices for configuring syslog input and Create syslog-ng rules to send data to Splunk on the Splunk community wiki for more information.

This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!