Introduction to Getting Data In
This topic provides an overview of the methods available to you for adding data to your Splunk Cloud deployment.
Fundamental Splunk and Splunk Cloud concepts
Before attempting to get data into your Splunk Cloud deployment, you should have a solid understanding of certain Splunk and Splunk Cloud concepts. The table lists these concepts. You should also review the information in the Splunk Cloud Getting Data In manual.
|deployment server||A deployment server is a Splunk Enterprise instance that acts as a centralized configuration manager for any number of forwarders, called "deployment clients". The deployment server is hosted on your premises or your Cloud environment (such as AWS or Azure). For a more detailed description of the components of a deployment server, see Deployment Server Architecture.|
|indexes||The index is the repository for your data. When the Splunk platform indexes raw data, it transforms the data into searchable events. For more information about indexes, see Manage Indexes.|
|Inputs Data Manager||The Inputs Data Manager (IDM) is a component of your Splunk Cloud environment optimized for data ingestion. It is intended for use with cloud data sources or when using add-ons that require inputs on the Search tier.|
Customers on the Splunk Cloud Victoria Experience don't need to use an IDM. For more information, see Determine your Splunk Cloud Experience.
search head cluster
|For more information, see search head and search head cluster in the Splexicon.|
|source types||A source type is one of the critical default fields that Splunk software assigns to all incoming data. It tells Splunk software what kind of data you have, so that it can format the data intelligently during indexing. For more information, see Why Source Types Matter.|
|Splunk applications and add-ons||A Splunk app is an application that runs on the Splunk platform and typically addresses several use cases. Add-ons support and extend the functionality of the Splunk platform and the apps that run on it, usually by providing inputs for a specific technology or vendor. For more information about add-ons, see About Splunk add-ons.|
|universal forwarder||The universal forwarder is a dedicated, streamlined version of Splunk Enterprise that contains only the essential components needed to forward data. The universal forwarder does not support python and does not expose a UI. In most situations, the universal forwarder is the best way to forward data to indexers. Its main limitation is that it forwards unparsed data, except in certain cases, such as structured data. For more information, see Work with forwarders.|
Types of data that Splunk Cloud accepts
Splunk Cloud accepts a wide variety of data, including IT streaming, machine, and historical data such as Windows event logs, web server logs, live application logs, network feeds, system metrics, change monitoring, message queues, and archive files. Splunk Cloud can monitor relational databases and third-party infrastructures such as DB2, Cisco, Active Directory, Hadoop, and so on. For more information, see the following sections in the Getting Data In manual:
- Get data from files and directories
- Get data from network sources
- Get Windows data
- Get data with HTTP Event Collector
- Get other kinds of data in
Tools to get data into Splunk Cloud
This section is designed to help you make decisions about the best way to get data into your Splunk Cloud instance. There are a few different ways to get data into Splunk Cloud. The best way to get data in depends on the source of the data and what you intend to do with it. You use one or more of the following tools to get data into Splunk Cloud:
- Forwarders: A forwarder is a version of Splunk Enterprise optimized to send data. A universal forwarder is a purpose-built data collection mechanism with very minimal resource requirements, whereas a heavy forwarder is full Splunk Enterprise deployment configured to act as a forwarder with indexing disabled. See Work with forwarders.
- HTTP Event Collector (HEC): The HTTP Event Collector (HEC) lets you send data and application events to a Splunk deployment over the HTTP and Secure HTTP (HTTPS) protocols. See Work with HTTP Event Collector.
- Apps and add-ons: Splunk offers apps and add-ons, with pre-configured inputs for specific types of data sources, such as Cisco security data and Blue Coat data. Splunk apps and add-ons extend the capability and simplify the process of getting data into your Splunk platform deployment. See Work with apps and add-ons.
- Inputs Data Manager (IDM): The IDM is a hosted solution for Splunk Cloud for scripted and modular inputs. It is also very useful when you want to send cloud-based inputs directly to Splunk Cloud without sending it to an on-premise forwarder. See Work with Inputs Data Manager.
Work with forwarders
Usually, to get data from your customer site to Splunk Cloud, you use a forwarder. Splunk forwarders send data from a datasource to your Splunk Cloud deployment for indexing, which makes the data searchable. Forwarders are lightweight processes, so they can usually run on the machines where the data originates. The following sections describe the two types of forwarders commonly used to get data in.
To forward data to Splunk Cloud, you typically use the Splunk universal forwarder.
A universal forwarder is a dedicated, streamlined version of Splunk Enterprise that contains only the essential components needed to send data. The universal forwarder is usually the best tool for forwarding data to indexers. Its main limitation is that it forwards only unparsed data. To send event-based data to indexers, you must use a heavy forwarder or IDM.
By default, the universal forwarder can forward a maximum of 256 KB of data per second. As a best practice, do not exceed this limit. For more information, read Possible thruput limits in the Splunk Enterprise Troubleshooting Manual.
If you need to anonymize or otherwise preprocess data before it exits your enterprise network, or if a specific app or add-on that you are using does not support universal forwarders, use a heavy forwarder.
A heavy forwarder is a full Splunk Enterprise instance, with some features disabled to achieve a smaller footprint. For data sources that have to be collected using programmatic means (APIs, database access), or if you need to do data routing and filtering, deploying a data collection node (DCN) a heavy forwarder is recommended. It is not recommended that you run these kinds of inputs on the search head tier in anything other than a development environment. Since the heavy forwarder adds metadata to the messages, you may see as much as two to three times the network traffic when you use a heavy forwarder.
For more information about heavy forwarders, see the Splunk Forwarding Data manual.
Work with the forwarder app
When you work with forwarders, you download a forwarder app (from Splunk Cloud Home > Universal Forwarder > Download Universal Forwarder Credentials) that has the credentials specific to your Splunk Cloud instance. You install this app on your forwarder, heavy forwarder, or on your deployment server, and it allows you to easily connect to Splunk Cloud.
Work with a deployment server
In addition, if you have multiple forwarders, you may need to use a deployment server to manage them. A deployment server is a Splunk Enterprise instance that acts as a centralized configuration manager, grouping together and collectively managing any number of forwarders. Instances that are remotely configured by deployment servers are called deployment clients. The deployment server downloads updated content, such as configuration files and apps, to deployment clients. Units of such content are known as deployment apps.
Work with an intermediate forwarding tier
In some cases, you may optionally use an intermediate forwarding tier. An intermediate forwarder tier is a collection of universal forwarders that simply relays data to Splunk Cloud. This can be helpful if you want to limit the number of servers that need direct access to the Internet. It also reduces the overhead of updating firewall rules each time a new server is added or removed from the environment.
There are endpoints that do not allow installation of the universal forwarder, such as network devices, appliances, and logs using the syslog protocol. These are special considerations and are not covered in this document.
Use a Universal Forwarder to get data into Splunk Cloud
The universal forwarder is the best choice for a large set of data collection requirements from systems in your environment. It is a purpose-built data collection mechanism with very minimal resource requirements. The universal forwarder should be the default choice for collecting and forwarding log data. The universal forwarder provides:
- Checkpoint/restart function for lossless data collection.
- Efficient protocol that minimizes network bandwidth utilization.
- Throttling capabilities.
- Built-in, load-balancing across available indexers.
- Optional network encryption using SSL/TLS.
- Data compression (use only without SSL/TLS).
- Multiple input methods (files, Windows Event logs, network inputs, scripted inputs).
- Limited event filtering capabilities (Windows event logs only).
- Parallel ingestion pipeline support to increase throughput/reduce latency.
With few exceptions for well-structured data (json, csv, tsv), the Universal Forwarder does not parse log sources into events, so it cannot perform any action that requires understanding of the format of the logs. It also ships with a stripped down version of Python, which makes it incompatible with any modular input apps that require a full Splunk stack to function. It is normal for a large number of UFs (100s to 10,000s) to be deployed on endpoints and servers in a Splunk environment and centrally managed, either with a Splunk deployment server, or a third-party configuration management tool (like e.g. Puppet or Chef).
The following example shows universal forwarders being used to send data to Splunk Cloud:
Use a Heavy Forwarder to get data into Splunk Cloud
The heavy forwarder is a full Splunk Enterprise deployment configured to act as a forwarder with indexing disabled. A heavy forwarder generally performs no other Splunk roles. The key difference between a universal forwarder and a heavy forwarder is that the heavy forwarder contains the full parsing pipeline, performing the identical functions an indexer performs, without actually writing and indexing events on disk. This enables the heavy forwarder to understand and act on individual events, for example, to mask data or to perform filtering and routing based on event data. Since it is a full Splunk Enterprise installation, it can host modular inputs that require a full Python stack to function properly for data collection or serve as an endpoint for the Splunk HTTP event collector (HEC).
The heavy forwarder performs the following functions:
- Parses data into events.
- Filters and routes based on individual event data.
- Has a larger resource footprint than the UF.
- Has a larger network bandwidth footprint than the Universal Forwarder.
- GUI for management.
In general, heavy forwarders are not installed on endpoints for the purpose of data collection. Instead, they are used on standalone systems to implement data collection nodes (DCN) or intermediary forwarding tiers. Use a heavy forwarder only when requirements to collect data from other systems cannot be met with a Universal Forwarder. Examples of such requirements include:
- Reading data from RDBMS for the purposes of ingesting it into Splunk (database inputs).
- Collecting data from systems that are reachable via an API (cloud services, VMWare monitoring, proprietary systems, etc.).
- Providing a dedicated tier to host the HTTP event collector service.
- Implementing an intermediary forwarding tier that requires a parsing forwarder for routing/filtering/masking.
The following example shows a heavy forwarder being used to send data to Splunk Cloud:
Work with the HTTP Event Collector
The HTTP Event Collector (HEC) lets you send data and application events to a Splunk deployment over the HTTP and Secure HTTP (HTTPS) protocols. HEC uses a token-based authentication model. You can generate a token and then configure a logging library or HTTP client with the token to send data to HEC in a specific format. This process eliminates the need for a Splunk forwarder when you send application events. After you enable HEC, you can use HEC tokens in your app to send data to HEC. You do not need to include Splunk credentials in your app or supported files.
For more information, see the following sections in Set up and use HTTP Event Collector in Splunk Web in the Getting Data In manual:
- HEC and Splunk Cloud
- Configure HTTP Event Collector on Splunk Cloud
- For general and specific information on sending data: Send data to HTTP Event Collector and Send data to HTTP Event Collector to Splunk Cloud
Work with Apps and Add-ons
Apps typically target specific data types and handle everything from configuring the inputs to generating useful views of the data. For example, the Splunk App for Windows Infrastructure provides data inputs, searches, reports, alerts, and dashboards for Windows host management. The Splunk App for Unix and Linux offers the same for Unix and Linux environments. There is a wide range of apps to handle specific types of application data, including the following:
- Splunk DB Connect
- Splunk Stream
- Splunk Add-on for Amazon Web Services
- Splunk Add-on for Google Cloud Platform
Apps and add-ons that contain a data collection component should be installed on forwarders or IDMs and on your Splunk Cloud instance for their data collection functions (modular or scripted inputs).
The following graphic shows a common topology with add-ons installed on forwarders and on the Splunk Cloud instance to extend the functionality for getting data in:
Work with Inputs Data Manager (IDM)
The Inputs Data Manager is a hosted solution for Splunk Cloud to support scripted and modular inputs and cloud-based inputs that you want to send directly to Splunk Cloud. As a best practice, use an IDM in the following cases:
- You have scripted or modular inputs that you want to send to Splunk Cloud. For example, you can poll a cloud-based database, web service, or API for specific data and process the results.
- You have cloud-based inputs such as Microsoft Azure or AWS that you want to send directly to Splunk Cloud without the intermediary step of sending data to an on-premise forwarder. You can send these inputs directly to an IDM rather than routing them through a forwarder to get the data into Splunk Cloud.
The following graphic shows the typical architecture of IDM. Note that the search tier and index tier are not hosted on the IDM. The IDM is not intended to store data or perform searches.
IDM is not supported on the Splunk Cloud Free Trial.
Ports opened for IDM
The following port access applies to inbound and outbound IDM ports:
- Inbound access to ports 443 and 8089 are controlled by an access list. Contact Support if you need to modify the access list.
- Outbound access to port 443 is open by default. Contact Support if you need to open additional outbound ports.
When you contact Support, provide a list of public IP addresses and subnets with this request. For example, you might want to open port 8089, the port for the REST API. Note that opening a specific outbound port opens the same port for all tiers in your Splunk Cloud environment.
Apps supported on IDM
If the app contains modular inputs and is Splunk Cloud certified, it is compatible with Splunk Cloud IDM. Generally, apps that are cloud-based are well-suited to IDM. Many cloud-based apps are supported, and the following list includes some of the most commonly used apps:
- Google Cloud Platform
- Microsoft Azure
To verify if your app is supported on IDM, check Splunkbase.
Limitations when working with IDM
The IDM is intended to function specifically as forwarder for modular and scripted inputs, or to obviate the need to route cloud-based inputs through an on-premise forwarder. Therefore, the following functions are not intended to be performed on the IDM:
- Search capabilities are capped for users on IDM. The IDM is not intended to function as a search head.
- IDM does not currently support Self-Service App Installations. To get modular and scripted input onto the IDM, you need to create a private app and request that Support upload it.
- If an add-on is tightly integrated with an Enterprise Security search head, do not use IDM.
- HEC inputs are not supported with IDM.
- IDMs are not syslog sinks, nor can they receive unencrypted TCP streams.
- Use a heavy forwarder if you need to perform parsing.
Use IDM with scripted and modular Inputs
To use scripted or modular inputs, you must package them in a private app. To do this, complete the following high-level steps:
- Create your modular or scripted inputs. For instructions on creating these inputs, see Get data from scripted inputs in the Getting Data in Guide.
- Package the script or modular input in a private app. For instructions on building a private app for Splunk Cloud, see Overview in the Develop a Private App for Splunk Cloud guide.
- Submit the private app for Splunk Cloud vetting.
- Request that Support upload the app to your IDM.
Use IDM with cloud-based add-ons
When you work with IDM and Cloud-based add-ons, complete the following high-level steps to get data in:
- Create a support request to install the Add-on.
- Configure an index on your Splunk Cloud instance. This index is going to be associated with your cloud input.
- Perform any configurations needed on the cloud-based source that enables you to get data in.
- Configure the Splunk Add-on on your Inputs Data Manager (IDM).
- You will also need to configure inputs on the IDM. The IDM is responsible for data ingestion.
- Verify that data is flowing to your Splunk Cloud environment.
For detailed instructions on getting Azure data in using IDM, see Get Microsoft Azure data into Splunk Cloud in the Splunk Cloud Admin Guide.
For detailed instructions on getting AWS data in using IDM, see Get Amazon Web Services (AWS) data into Splunk Cloud in the Splunk Cloud Admin Guide.
Splunk Cloud Quick Start
Get Amazon Web Services (AWS) data into Splunk Cloud
This documentation applies to the following versions of Splunk Cloud™: 7.2.9, 7.2.10, 8.0.2006, 8.0.2007, 8.1.2008, 8.1.2009, 8.1.2011, 8.1.2012 (latest FedRAMP release), 8.1.2101, 8.1.2103, 8.2.2104, 8.2.2105