Use forwarders to get data into Splunk Cloud
You can get data into Splunk Cloud in a number of ways. The best way depends on the source of the data and what you want to do with that data. You use one or more instances of the following tools to get data into Splunk Cloud:
A forwarder is a Splunk Enterprise instance that has been optimized to send data. You might use multiple forwarders to send data depending on the volume and location of your source data. See Work with forwarders
- Inputs Data Manager (IDM).
The IDM is a hosted solution for Splunk Cloud for scripted and modular inputs. In a majority of cases, an IDM eliminates the need for customer-managed infrastructure.
Forwarder types for getting data into Splunk Cloud
Usually, to get data from your customer site to Splunk Cloud, you use a forwarder. There are two types of forwarders that you can use to get data in:
- A universal forwarder is a dedicated, streamlined version of Splunk Enterprise that contains only the essential components needed to send data. The universal forwarder is oftentimes the best tool for forwarding data to indexers. Its main limitation is that it forwards only unparsed data, with a few exceptions. To send event-based data to indexers, you must use either a heavy forwarder or IDM.
- A heavy forwarder is a full Splunk Enterprise instance with some features disabled to achieve a smaller footprint. For data sources that must be collected using programmatic means, for example APIs and database access, or if you need to route and filter data, use a heavy forwarder to avoid running these kinds of inputs in the search head tier. Since the heavy forwarder adds metadata to the messages, you might see as much as two to three times the network traffic when you use a heavy forwarder.
Splunk Cloud and the forwarder credentials app
When you work with forwarders to send data to Splunk Cloud, you must download an app that has the credentials specific to your Splunk Cloud instance. You install the forwarder credentials app on your universal forwarder, heavy forwarder, or deployment server, and it lets you connect to Splunk Cloud.
Use a deployment server to deliver configurations to multiple forwarders
If you have multiple forwarders, you might need to use a deployment server to manage them. A deployment server is a Splunk Enterprise instance that acts as a centralized configuration manager. It groups together and collectively manages any number of forwarders. Forwarder instances that are remotely configured by deployment servers are called deployment clients. The deployment server downloads content such as configuration files and apps, to deployment clients to simplify the process of downloading these files to multiple locations for each configuration change.
Work with an intermediate forwarding tier
An intermediate forwarder tier is a collection of heavy forwarders that act as an aggregation point for data sent from other forwarders in your network. Once the data arrives at the intermediate tier, it is forwarded to Splunk Cloud. By sending data from other forwarders through the intermediate forwarder tier, you can limit the hosts that will communicate with Splunk Cloud over the internet.
As your intermediate tier uses heavy forwarders, you can also route or drop data before it leaves your network for Splunk Cloud. For more information about data parsing and routing, see Use a heavy forwarder to get data into Splunk Cloud.
When implementing an intermediate forwarder tier, try to maintain a 2:1 or greater ratio of intermediate forwarders to your Splunk Cloud indexers. You can accomplish this by adding additional intermediate forwarder nodes, or by configuring your intermediate forwarder nodes to use multiple pipelines. For more information, see Configure a forwarder to handle multiple pipeline sets in the Forwarder Manual.
Use a universal forwarder to get data into Splunk Cloud
The universal forwarder is the best choice for a large set of data collection requirements from systems in your environment. It is a purpose-built data collection mechanism with minimal resource requirements. The universal forwarder is the default choice for collecting and forwarding log data. The universal forwarder provides the following functionality:
- Checkpoint and restart function for lossless data collection.
- Efficient protocol that minimizes network bandwidth utilization.
- Throttling capabilities.
- Built-in, load-balancing across available indexers.
- Optional network encryption using SSL or TLS.
- Data compression (use only without SSL or TLS).
- Multiple input methods (files, Windows Event logs, network inputs, scripted inputs).
- Limited event filtering capabilities (Windows event logs only).
- Parallel ingestion pipeline support to increase throughput and reduce latency.
The universal forwarder does not parse log sources into events, so it cannot perform any action that requires the ability to format the logs. It also ships with a stripped-down version of Python, which makes it incompatible with any modular input apps that require a full Splunk platform to function. It is normal for a large number of universal forwarders (from hundreds to tens of thousands) to be deployed on endpoints and servers in a Splunk platform environment and to be centrally managed, either with a Splunk deployment server, or a third-party configuration management tool, such as Puppet or Chef.
The following diagram shows how universal forwarders send data to Splunk Cloud:
Use a heavy forwarder to get data into Splunk Cloud
The heavy forwarder is a full Splunk Enterprise instance configured to act as a forwarder with indexing disabled. A heavy forwarder generally performs no other Splunk Enterprise functions (for example, do not use a heavy forwarder to search data). The main difference between a universal forwarder and a heavy forwarder is that the heavy forwarder contains the full parsing pipeline, performing the identical functions an indexer performs, without writing and indexing events on disk. This configuration enables the heavy forwarder to parse and act on individual events. For example, a heavy forwarder can mask data or perform filtering and routing based on event data. Because it is a full Splunk Enterprise installation, it can host modular inputs that require a full Python stack to function properly for data collection or serve as an endpoint for the Splunk HTTP event collector (HEC).
The heavy forwarder has the following characteristics:
- Parses data into events
- Filters and routes based on individual event data
- Has a larger resource footprint than the UF
- Has a larger network bandwidth footprint than the Universal Forwarder
- Has a GUI for management
In general, heavy forwarders are not installed on endpoints for the purpose of data collection. Instead, they are used on standalone systems to implement data collection nodes (DCN) or intermediary forwarding tiers. Use a heavy forwarder only when requirements to collect data from other systems can't be met with a Universal Forwarder. Instances in which a heavy forwarder is required include the following examples:
- Reading data from a RDBMS for the purposes of ingesting it into Splunk Cloud (database inputs).
- Collecting data from systems that are reachable using an API, such as cloud services, VMWare monitoring, and proprietary systems.
- Providing a dedicated tier to host the HTTP event collector service.
- Implementing an intermediary forwarding tier that requires a parsing forwarder for routing, filtering, and masking.
If you want to set up a heavy forwarder to send data in Splunk Cloud, request a deployment server license from Splunk support to allow them to carry out functions above and beyond what is covered by the forwarder license. See Data collection in the Splunk Cloud Service Description.
The following example shows a heavy forwarder used to send data to Splunk Cloud:
Work with Inputs Data Manager (IDM)
The IDM is a hosted solution for Splunk Cloud for scripted and modular inputs. In a majority of cases, an IDM eliminates the need for customer-managed infrastructure. IDM lets you send Cloud inputs, such as Azure or AWS data to Splunk Cloud without using a forwarder. However, an IDM is not a one-to-one replacement for a heavy forwarder. You must still use a heavy forwarder if you need to perform parsing or activities other than standard scripted and modular data inputs. As a best practice, install cloud-based add-ons on an IDM, and install on-premises-based add-ons on a universal forwarder or heavy forwarder.
If the add-on is tightly integrated with an Enterprise Security search head, do not use an IDM.
The following diagram shows the typical topology for an IDM in which data is sent directly from a Cloud data source to the IDM:
Is my data local or remote?
Use forwarders to get data into Splunk Enterprise
This documentation applies to the following versions of Splunk Cloud™: 8.0.2006, 8.0.2007, 8.1.2008, 8.1.2009, 8.1.2011, 8.1.2012 (latest FedRAMP release), 8.1.2101, 8.1.2103