Learn about the data in your Splunk deployment
After you have discovered and diagrammed the topology of your Splunk deployment, the next task is to learn about the data in the deployment.
There are two parts to understand the data in a Splunk deployment. The first part is about how stored data is managed in the deployment. The second part is how the Splunk deployment ingests the data.
Learn about stored data
Before you assumed control of the deployment, it was configured to ingest data from certain data sources. The person or group who owned the data determined the following:
- The amount of available data
- The relevance of the data to the organization
- The length of time that the organization wanted to keep the data that Splunk software ingested
- The data retention policy of the organization
- The people that required access to the data
- The need for any sensitive data to be anonymized
They then worked with other groups to set up Splunk software to get the data indexed and stored.
You can learn about the data that has been indexed by Splunk software with the following methods:
- Review the data summary
- Run searches on the data
Review the data summary
With the Data Summary in Splunk Web, you can determine data sources, source types, and the hosts that generated the data. This is the most comprehensive way of learning what data is present in a Splunk deployment.
- Log into the Splunk instance. If the deployment is distributed, log into a search head.
- Click Search and Reporting.
- Click Data Summary.
- Click on one of the tabs to get information about the Hosts, Sources, or Sourcetypes that the instance has indexed.
- (Optional) Click on an entry in the Data Summary list to run a search that contains that entry in its results.
For more information about the Search app, see About the Search app in the Search Manual.
Run searches on the data
With Splunk search, you can create a timeline that shows when the data was ingested by running search commands and adjusting timeline parameters. The kinds of searches you want to run depend on the kind of data you are searching for. You can use the Data Summary to learn what has been indexed into the instance and what you can search for.
- Log into the Splunk instance.
- Click Search and Reporting.
- Enter a search that represents the data that you expect to see. If you do not know what data you have, you can use the Data Summary.
- (Optional) Use the event timeline to determine how far the events go back.
- (Optional) Set the time picker to a different time range to see events that occur only during that range.
- Click on individual items in the results to change search parameters or run a new search based on that item.
For information about searching, see Anatomy of a search in the Search Manual.
Learn about the data generators in the deployment
For Splunk software to receive data, it must be configured with data inputs. Inputs can be configured on the Splunk indexer, but in most deployments, forwarders are configured with the inputs and do the data collection. The data flows from the forwarders into the indexer where Splunk software breaks up the data into events that can be form the basis for searches, reports, and dashboards, or be modified to fit the needs of the data consumers in your organization.
Splunk software can ingest many different kinds of machine data. The Getting Data In Manual provides information on the machine data that Splunk software can ingest, and includes but is not limited to:
- Log files
- Data from scripts and processes
- Network streams, including monitoring of TCP, UDP, and HTTP traffic with the HTTP Event Collector
- Windows data, including Windows Event Log, Registry changes, and Performance Monitoring metrics
Learn about how Splunk software uses input configurations to get data
You can determine where data generation occurs after you have discovered your Splunk deployment topology. You can also do this while you are in the process of discovering your deployment topology, but it is easier to gather information on configurations after the deployment topology has been discovered.
Forwarders and indexers can get data input and other configurations in several ways:
- Locally, through an
inputs.confconfiguration file. This is the most common method for how Splunk instances get configuration information
- Through an app or add-on that has been installed on the instance
- From a deployment server that the forwarder or indexer has connected to
The deployment server is an advanced configuration topic outside the scope of this topic. To learn more about the deployment server and how it works, see About deployment server and forwarder management in Updating Splunk Enterprise Instances.
inputs.conf file defines data inputs and controls aspects of data collection for the forwarder or indexer:
- When to collect data
- What type of data to collect
- How often to collect the data
- Where to index the data it has collected
- How to index the data it has collected
On forwarders, there is a file called
outputs.conf that controls where the forwarder sends the data. Like
inputs.conf it can be a standalone configuration, a configuration that is part of an app or add-on, or a configuration that has been retrieved from a deployment server.
Splunk software uses a scheme called configuration file precedence to build a master configuration file to handle multiple data collection and forwarding scenarios. See configuration file precedence in the Admin Manual.
Discover Splunk data collection configurations
The following procedure represents high-level guidance for determining the inputs in your Splunk deployment.
- After you locate indexers and forwarders in the deployment, confirm whether they have a local configuration for data inputs, get a configuration from an app or add-on, or retrieve configurations from a deployment server.
- If the forwarder is configured to connect to a deployment server, check the deployment server to see its configurations. Any forwarder that connects to this server gets these configurations. The configurations can be standalone or contained within apps or add-ons.
inputs.confconfiguration files to see what data is being collected. You can find these files in the following places:
- By themselves, in
- In an app or add-on, in
$SPLUNK_HOME/etc/apps/<name of app>/local
- On a deployment server, in
$SPLUNK_HOME/etc/deployment_apps/<name of app>/local
- By themselves, in
- See the Getting Data In Manual for information about the types of data that each instance collects.
- If you have a diagram of your Splunk deployment, indicate the locations of the data collecting instances in the diagram, and what data they are collecting.
After you have discovered where the data inputs are, you can do the following:
- Determine whether or not input configurations need to be added, changed, or removed, depending on business purpose or data collection performance improvements.
- Determine if you want to set up the Monitoring Console, if it has not already been set up
- Determine whether or not changes need to be made to index data according to Splunk best practices for getting data in.
Components and their relationship with the network
Review your apps and add-ons
This documentation applies to the following versions of Splunk® Enterprise: 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 6.6.6, 6.6.7, 6.6.8, 6.6.9, 6.6.10, 6.6.11, 6.6.12, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.3.0