Learn about the data in your Splunk deployment

After you have discovered and diagrammed the topology of your Splunk deployment, the next task is to learn about the data in the deployment.

There are two parts to understand the data in a Splunk deployment. The first part is about how stored data is managed in the deployment. The second part is how the Splunk deployment ingests the data.

Learn about stored data

Before you assumed control of the deployment, it was configured to ingest data from certain data sources. The person or group who owned the data determined the following:

The amount of available data
The relevance of the data to the organization
The length of time that the organization wanted to keep the data that Splunk software ingested
The data retention policy of the organization
The people that required access to the data
The need for any sensitive data to be anonymized

They then worked with other groups to set up Splunk software to get the data indexed and stored.

You can learn about the data that has been indexed by Splunk software with the following methods:

Review the data summary
Run searches on the data

Review the data summary

With the Data Summary in Splunk Web, you can determine data sources, source types, and the hosts that generated the data. This is the most comprehensive way of learning what data is present in a Splunk deployment.

Log into the Splunk instance. If the deployment is distributed, log into a search head.
Click Search and Reporting.
Click Data Summary.
Click on one of the tabs to get information about the Hosts, Sources, or Sourcetypes that the instance has indexed.
(Optional) Click on an entry in the Data Summary list to run a search that contains that entry in its results.

For more information about the Search app, see About the Search app in the Search Manual.

Run searches on the data

With Splunk search, you can create a timeline that shows when the data was ingested by running search commands and adjusting timeline parameters. The kinds of searches you want to run depend on the kind of data you are searching for. You can use the Data Summary to learn what has been indexed into the instance and what you can search for.

Log into the Splunk instance.
Click Search and Reporting.
Enter a search that represents the data that you expect to see. If you do not know what data you have, you can use the Data Summary.
(Optional) Use the event timeline to determine how far the events go back.
(Optional) Set the time picker to a different time range to see events that occur only during that range.
Click on individual items in the results to change search parameters or run a new search based on that item.

For information about searching, see Anatomy of a search in the Search Manual.

Learn about the data generators in the deployment

For Splunk software to receive data, it must be configured with data inputs. Inputs can be configured on the Splunk indexer, but in most deployments, forwarders are configured with the inputs and do the data collection. The data flows from the forwarders into the indexer where Splunk software breaks up the data into events that can be form the basis for searches, reports, and dashboards, or be modified to fit the needs of the data consumers in your organization.

Splunk software can ingest many different kinds of machine data. The Getting Data In Manual provides information on the machine data that Splunk software can ingest, and includes but is not limited to:

Log files
Data from scripts and processes
Network streams, including monitoring of TCP, UDP, and HTTP traffic with the HTTP Event Collector
Windows data, including Windows Event Log, Registry changes, and Performance Monitoring metrics

Learn about how Splunk software uses input configurations to get data

You can determine where data generation occurs after you have discovered your Splunk deployment topology. You can also do this while you are in the process of discovering your deployment topology, but it is easier to gather information on configurations after the deployment topology has been discovered.

Forwarders and indexers can get data input and other configurations in several ways:

Locally, through an inputs.conf configuration file. This is the most common method for how Splunk instances get configuration information
Through an app or add-on that has been installed on the instance
From a deployment server that the forwarder or indexer has connected to

The deployment server is an advanced configuration topic outside the scope of this topic. To learn more about the deployment server and how it works, see About deployment server and forwarder management in Updating Splunk Enterprise Instances.

The inputs.conf file defines data inputs and controls aspects of data collection for the forwarder or indexer:

When to collect data
What type of data to collect
How often to collect the data
Where to index the data it has collected
How to index the data it has collected

On forwarders, there is a file called outputs.conf that controls where the forwarder sends the data. Like inputs.conf it can be a standalone configuration, a configuration that is part of an app or add-on, or a configuration that has been retrieved from a deployment server.

Splunk software uses a scheme called configuration file precedence to assemble a single, merged configuration from multiple files that's used to manage data collection and forwarding. See configuration file precedence in the Admin Manual.

Discover Splunk data collection configurations

The following procedure represents high-level guidance for determining the inputs in your Splunk deployment.

After you locate indexers and forwarders in the deployment, confirm whether they have a local configuration for data inputs, get a configuration from an app or add-on, or retrieve configurations from a deployment server.
If the forwarder is configured to connect to a deployment server, check the deployment server to see its configurations. Any forwarder that connects to this server gets these configurations. The configurations can be standalone or contained within apps or add-ons.
Review inputs.conf configuration files to see what data is being collected. You can find these files in the following places:
1. By themselves, in $SPLUNK_HOME/etc/system/local
2. In an app or add-on, in $SPLUNK_HOME/etc/apps/<name of app>/local
3. On a deployment server, in $SPLUNK_HOME/etc/deployment-apps/<name of app>/local
See the Getting Data In Manual for information about the types of data that each instance collects.
If you have a diagram of your Splunk deployment, indicate the locations of the data collecting instances in the diagram, and what data they are collecting.

Next steps

After you have discovered where the data inputs are, you can do the following:

Determine whether or not input configurations need to be added, changed, or removed, depending on business purpose or data collection performance improvements.
Determine if you want to set up the Monitoring Console, if it has not already been set up
Determine whether or not changes need to be made to index data according to Splunk best practices for getting data in.

Related answers from Splunk Community

Learn about the data in your Splunk deployment

Learn about stored data

Review the data summary

Run searches on the data

Learn about the data generators in the deployment

Learn about how Splunk software uses input configurations to get data

Discover Splunk data collection configurations

Next steps

Comments

Learn about the data in your Splunk deployment

Was this topic useful?