Data source planning

The volume, type, and number of data sources influences the overall Splunk platform architecture, the number and placement of forwarders, estimated load, and impact on network resources.

Splunk Enterprise Security requires that all data sources comply with the Splunk Common Information Model (CIM). Enterprise Security is designed to leverage the CIM standardized data models both when searching data to populate dashboard panels and views, and when providing data for correlation searches.

Map add-ons to data sources

The add-ons included with Splunk Enterprise Security are designed to parse and categorize known data sources and other technologies for CIM compliance.

For each data source:

Identify the add-on: Identify the technology and determine the corresponding add-on. The primary sources for add-ons are the Add-ons provided with Enterprise Security and the content available on Splunkbase. An add-on must be CIM compatible or be modified to support CIM data schemas. For an example, see Use the CIM to normalize data at search time in the Common Information Model Add-on Manual.
Install the add-on: Install the add-on on the Enterprise Security search head. Also install add-ons that perform index-time processing on each indexer. Splunk Cloud customers must work with Splunk Support to install add-ons on search heads and indexers. If the forwarder architecture includes sending data through a parsing or heavy forwarder, the add-on might be needed on the heavy forwarder.
Configure the server, device, or technology where necessary: Enable logging or data collection for the device or application and/or configure the output for collection by a Splunk instance. Consult the vendor documentation for implementation.
Customize the add-on where necessary: An add-on might require customization, such as setting the location or source of the data, choosing whether the data is located in a file or in a database, or other unique settings.
Set up a Splunk data input and confirm the source type settings: The add-on's README file includes information about the source type setting associated with the data, and might include customization notes about configuring the input.

Considerations for data inputs

Splunk platform instances provide tools to ingest data inputs, including many that are specific to a particular application or technology's needs. Depending on the technology or source being collected, choose the input method based on performance impact, ease of data access, stability, minimizing source latency, and maintainability. You can configure a forwarder to accept data by monitoring files, network ports, Windows data, network wire data, and by running scripted inputs.

Monitoring files: Deploy a Splunk forwarder on each system hosting the files, and set the source type on the forwarder using an input configuration. If you have a large number of systems with identical files, use the Splunk Enterprise deployment server to set up standardized file inputs across large groups of forwarders.

Monitoring network ports: Use standard tools such as a syslog server, or create listener ports on a forwarder. Sending multiple network sources to the same port or file complicates source typing. For more information, see Get data from TCP and UDP ports in the Getting Data In Manual.

Monitoring Windows data: A forwarder can obtain information from Windows hosts using a variety of configuration options. See How to get Windows data into Splunk Enterprise in the Getting Data In Manual.

Monitoring network wire data: Splunk Stream supports the capture of real-time wire data. See About Splunk Stream in the Splunk Stream Installation and Configuration Manual.

Scripted inputs: Use scripted inputs to get data from an API or other remote data interfaces and message queues. Configure the forwarder to call shell scripts, python scripts, Windows batch files, PowerShell, or any other utility that can format and stream the data that you want to index. You can also write the data polled by any script to a file for direct monitoring by a forwarder. See Get data from APIs and other remote data interfaces through scripted inputs in the Getting Data In Manual.

Collect asset and identity information

Splunk Enterprise Security uses an asset and identity correlation system. Enterprise Security compares asset and identity information with source events to provide additional data enrichment and context for analysis.

Identify assets and identities

An asset represents devices and systems in the environment that generate data. An identity can represent a user, credential, or a role used to grant access to a device or system. Determine the repositories that will provide asset and identity data for integration with Enterprise Security, and how Enterprise Security will access that data.

In a highly regulated network environment, one database or repository might be the only source of information for both assets and identities. However, it is more common to find them spread among many unique repositories, hosted on different technologies, and maintained by many departments. As asset information changes and identities are added and removed, updates should be integrated into ES as a recurring task.

Asset lists

An asset list is a lookup table of fields. The asset input is designed to merge one or more asset lists into a correlated pair of tables by key value that provides information about an asset. You can manage and configure the assets and identities lists using the Identity Management dashboard. An asset list does not have to have all fields defined. For a complete list of fields, see Asset lookup fields in the User Manual.

Identity lists

An identity list is a comma-separated value (CSV) lookup table of fields. The identity input is designed to merge one or more identity lists into a correlated table that provides information about an identity. You can manage and configure the asset and identity lists using the Identity Management dashboard. An identity list does not have to have all fields defined. For a complete list of fields, see Identity lookup fields in the User Manual.

Collection options for assets and identities

The preferred collection method to provide asset or identities information is through a Splunk platform app. There are a number of add-ons that can be used to automate connections to external systems for data collection. Use an add-on to connect, collect, and return data to Enterprise Security.

You can create additional lists by automating capture from other asset or identity repositories through the use of a custom script or modular input. Indexed events in Splunk Enterprise are another potential source of data for asset and identity information. Use the Splunk search language to collect the information, sort and table the fields, and export the results. Use a manually populated lookup file for asset information collected from static lists, such as data sources that are not directly accessible through the other methods mentioned.

For a sample list of asset and identity sources with collection methods, see Collection methods for assets and identities in the User Manual.

Related answers from Splunk Community

Data source planning

Map add-ons to data sources

Considerations for data inputs

Collect asset and identity information

Identify assets and identities

Asset lists

Identity lists

Collection options for assets and identities

Comments

Data source planning

Was this topic useful?