Use ingest actions to improve the data input process
Ingest actions is a feature for routing, filtering, and masking data while it is streamed to your indexers. Each data transformation is expressed as a rule. You can apply multiple rules to a data stream, and save the combined rules as a ruleset.
As an alternative option to using ingest actions, the Edge Processor solution is also a Splunk data transformation service. See Compare Ingest Actions to the Edge Processor solution for a comparison table.
The Ingest Actions page in Splunk Web allows you to dynamically preview and build rules, using sample data.
You can configure ingest actions for these deployment topologies:
- Indexer clusters. Configure and preview the ruleset from the cluster manager or from a connected search head, which proxies to the cluster manager. You then explicitly deploy the ruleset to the cluster peer nodes.
- Standalone indexers. Configure, preview, and save the ruleset directly on the indexer. The ruleset is effective immediately.
- Heavy forwarders via deployment server. Configure the ruleset on a deployment server. The deployment server automatically deploys the ruleset to heavy forwarders configured as deployment clients.
- Standalone heavy forwarders. Configure and save the ruleset directly on the forwarder. The ruleset is effective immediately.
- Splunk Cloud Platform. Configure and preview the ruleset from your search head. In the case of the Victoria Experience, the ruleset will be deployed automatically to the indexers. In the case of the Classic Experience, you need to explicitly deploy the ruleset.
Requirements
Indexer cluster
- Requires access to Splunk Web on the cluster manager or on a connected search head as the
admin
role, or as a member of a role with thelist_ingest_rulesets
andedit_ingest_rulesets
capabilities.
Standalone indexer
- Requires access to Splunk Web as the
admin
role, or as a member of a role with thelist_ingest_rulesets
andedit_ingest_rulesets
capabilities. - The standalone indexer cannot be configured to also function as a deployment server.
Heavy forwarders managed through a deployment server
- Requires access to Splunk Web on the deployment server as the
admin
role, or as a member of a role with thelist_ingest_rulesets
andedit_ingest_rulesets
capabilities. - For the live capture feature on the deployment server, a maximum of ten heavy forwarders are used to collect sample events. When deploying Ingest Action rulesets from a deployment server to a fleet of deployment clients, Splunk supports a soft limit of up to 1,000 heavy forwarders.
- The deployment server must be dedicated to the ingest actions heavy forwarder tier. It cannot service any other deployment clients.
- Any rules created on the deployment server will apply only to the deployment clients, not to the deployment server itself (as, for example, if the deployment server is also functioning in some capacity as a standalone indexer).
- The heavy forwarders must be preconfigured as deployment clients of the deployment server where the data ingest configuration occurs. For information on configuring deployment clients, see Configure deployment clients.
- The Ingest Actions page on the deployment server automatically creates the
IngestAction_AutoGenerated
server class and assigns that class to the forwarders. - If you want the heavy forwarders to send data to an S3 destination, you must configure the S3 destination on each of the heavy forwarders individually, either through the Ingest Actions page on each forwarder or through an outputs.conf file on each forwarder. You cannot configure the destination on the deployment server. To configure the destination on the Ingest Actions page, the heavy forwarders require access to Splunk Web as the
admin
role, or as a member of a role with thelist_ingest_rulesets
andedit_ingest_rulesets
capabilities.
Standalone heavy forwarder
- Requires access to Splunk Web as the
admin
role, or as a member of a role with thelist_ingest_rulesets
andedit_ingest_rulesets
capabilities.
Splunk Cloud Platform
- Available for Splunk Cloud on AWS and GCP. Availability for Splunk Cloud on GCP is limited to deployments running version 9.1.2312 or higher.
- Requires access to Splunk Web on the search head as the
sc_admin
role, or as a member of a role with thelist_ingest_rulesets
andedit_ingest_rulesets
capabilities.
License implications
Ingest-based licenses: Data that is filtered or routed by the ingest actions feature, such that the data does not get added to an index, does not count against your license.
Workload-based licenses: Ingest actions workloads that don't occur at the indexing tier do not count against your license. For example, workloads that occur at the heavy forwarder tier do not count against your license.
Introduction to rules and rulesets
A rule is a specific type of data transformation. A rule can route, filter, or mask data. Descriptions are provided below. By using multiple rules, you can perform complex modifications to an incoming data source before its data is indexed, or skip indexing of some data entirely.
A ruleset is a set of rules applied to a data source. Only one ruleset per source type is supported. Rules in a ruleset are processed in order.
You create rules through the Ingest Actions page:
- On indexer clusters, you access the Ingest Actions page on the cluster manager or on a connected search head.
- For groups of heavy forwarders, you access the Ingest Actions page on a deployment server dedicated to the ingest actions function.
- On Splunk Cloud Platform, you access the Ingest Actions page on a search head.
Once you create a ruleset you must save it. Depending on where you created the ruleset, the ruleset is either immediately effective or requires an additional deployment step.
Once the ruleset has been deployed, each rule in the ruleset is applied to its matching data stream before the data is indexed.
After a ruleset is applied, the data cannot be reverted to its original form. Changing or deleting an existing ruleset affects only new data. If you want to retain the original data while also modifying some of it, use the clone events feature, described in this topic.
Access and edit the Ingest Actions page
The process of accessing the Ingest Actions page varies slightly depending on the deployment topology.
On indexer clusters
For Splunk Enterprise indexer clusters, you can create a ruleset either on the cluster manager or on a connected search head. In the case of a connected search head, the search head proxies the configuration to the cluster manager. When finished, you then explicitly deploy the ruleset configuration to the set of peer nodes.
Perform these steps:
- On the cluster manager or connected search head, select Settings > Data > Ingest Actions.
- If routing to S3, add an S3 destination through the Destinations tab.
- Through the Rulesets tab:
- Provide a ruleset name and description.
- In the Event Stream, provide a source type for the data preview.
- Add a rule. Descriptions are provided below.
- Use the data preview to review the impact of the rule on your data source.
- Add additional rules as needed.
- Save your rules in the ruleset.
- Once the ruleset has been saved, either directly on the cluster manager or through the search head, you must deploy the ruleset to the set of peer nodes. See Deploy a ruleset on an indexer cluster.
- Use Splunk Search to validate the changes to your data.
If you edit or delete an existing destination, the peer nodes will not undergo a rolling restart when the changes are deployed.
On standalone indexers
For Splunk Enterprise indexers, perform these steps to create a ruleset:
- On the indexer, select Settings > Data > Ingest Actions.
- If routing to S3, add an S3 destination through the Destinations tab.
- Through the Rulesets tab:
- Provide a ruleset name and description.
- In the Event Stream, provide a source type for the data preview.
- Add a rule. Descriptions are provided below.
- Use the data preview to review the impact of the rule on your data source.
- Add additional rules as needed.
- Save your rules in the ruleset. The updates are effective immediately on the indexer.
- Use Splunk Search to validate the changes to your data.
If you edit or delete an existing destination, you do not need to restart the instance for the changes to take effect.
On heavy forwarders managed through a deployment server
For Splunk Enterprise heavy forwarders managed through a deployment server, perform these steps to create a ruleset:
- On the deployment server, select Settings > Data > Ingest Actions.
- If routing to S3, add an S3 destination directly on each heavy forwarder, as described in the note below.
- Through the Rulesets tab:
- Provide a ruleset name and description.
- In the Event Stream, provide a source type for the data preview.
- Add a rule. Descriptions are provided below.
- Use the data preview to review the impact of the rule on your data source.
- Add additional rules as needed.
- Save your rules in the ruleset. The deployment server saves the ruleset in the
splunk_ingest_actions
app for theIngestAction_AutoGenerated
server class. It then automatically deploys the app to all members of theIngestAction_AutoGenerated
server class, first adding all forwarders to that class, if necessary. The ruleset takes effect immediately.
- Use Splunk Search to validate the changes to your data.
If you want the heavy forwarders to send data to an S3 destination, you must configure the destination individually on each heavy forwarder prior to creating the ruleset on the deployment server. Select Settings > Data > Ingest Actions on each heavy forwarder and configure the destination. You can alternatively create the destination in outputs.conf on each forwarder.
If you edit or delete an existing destination, you do not need to restart the forwarder for the changes to take effect.
On standalone heavy forwarders
For Splunk Enterprise heavy forwarders, perform these steps to create a ruleset:
- On the heavy forwarder, select Settings > Data > Ingest Actions.
- If routing to S3, add an S3 destination through the Destinations tab.
- Through the Rulesets tab:
- Provide a ruleset name and description.
- In the Event Stream, provide a source type for the data preview.
- Add a rule. Descriptions are provided below.
- Use the data preview to review the impact of the rule on your data source.
- Add additional rules as needed.
- Save your rules in the ruleset. The updates are effective immediately on the heavy forwarder.
- Use Splunk Search to validate the changes to your data.
If you edit or delete an existing destination, you do not need to restart the forwarder for the changes to take effect.
On Splunk Cloud Platform
For Splunk Cloud Platform, perform these steps to create a ruleset:
- On the search head, select Settings > Data > Ingest Actions. In some circumstances, you might need to first select the "Show All Settings" button under Settings.
- If routing to S3, add an S3 destination through the Destinations tab.
- Through the Rulesets tab:
- Provide a ruleset name and description.
- In the Event Stream, provide a source type for the data preview.
- Add a rule. Descriptions are provided below.
- Use the data preview to review the impact of the rule on your data source.
- Add additional rules as needed.
- Save your rules in the ruleset. In the case of the Victoria Experience, the ruleset deploys immediately. In the case of the Classic Experience, you must explicitly deploy the ruleset with the Deploy button at the top right of the Ingest Actions page.
- Use Splunk Search to validate the changes to your data.
Create a ruleset with the Ingest Actions page
Data preview
Data preview is available when you're building a ruleset. Data preview can help you define rules. It also estimates the changes a rule will have on the data source.
The data preview is only a preview of the rule changes, and does not actually modify any indexed data.
You can use data preview with several types of data sources:
- Live capture (deployment server and standalone indexers only) uses data directly from an incoming data stream.
- Indexed data (not available for deployment server) uses recently indexed data.
- Sample file uses data from a sample file that you upload. You can also copy and paste event logs.
The Sourcetype field is case-sensitive. You must use the correct case to show results for the sample events.
Selecting Sample retrieves events from the indexers or the incoming data stream. The All Events tab provides a visual indication of the rule matches. The Affected Events tab provides a total count, and displays the full event for every rule match.
If your data uses renamed source types, you might encounter issues that require remediation. See the Splunk Lantern article Using ingest actions with source types that are renamed with props and transforms.
When using live capture with deployment server, ensure the following conditions are met:
- Your firewall allows connections from the deployment server to its deployment clients.
- The deployment server and deployment clients use the same
pass4SymmKey
inserver.conf
:
[deployment] pass4SymmKey = <passphrase string>
Mask with regular expression
Use a masking rule to replace strings of text in your logs. A mask rule is typically applied to fields with unique identifiers, or user names, that are captured through logging.
The mask rule requires you to provide:
Setting | Description |
---|---|
Match Regular Expression | The field accepts a regular expression, or a simple string to match in the events. |
Replace Expression | The field accepts a string value to replace any matches. If you want to remove the matched values without substituting a replacement, simply enter a blank space. |
Filter with regular expression
Use a filtering rule to remove entire events from your logs. A filter rule is typically applied to log events that are not valued, such as DEBUG
messages, log headers, and redundant log messages.
This filter rule requires you to provide:
Setting | Description |
---|---|
Source Field | Use the drop down to select a data source by: _raw, host, index, source, or source type. |
Drop Events Matching Regular Expression | The field accepts a regular expression, or a simple string to match in the events. |
When using a filter rule, the Affected Events tab is a preview of events that will be deleted once the ruleset is deployed. If you add another rule after a filter, the new rule applies to any remaining, unfiltered events only.
Filter with eval expression
Using an eval expression is an alternative to using a regular expression for filtering. In most cases, the eval syntax is easier to read and comprehend, while offering the same functionality as a regular expression.
The eval expression rule does not support ingest-time lookups.
Use a filtering rule to remove entire events from your logs. A filter rule is typically applied to log events that are not valued, such as DEBUG
messages, log headers, and redundant log messages.
This filter rule requires you to provide:
Setting | Description |
---|---|
Drop Events Matching Eval Expression | When the eval expression match is true, those events will be dropped. |
When using a filter rule, the Affected Events tab is a preview of events that will be deleted once the ruleset is deployed. If you add another rule after a filter, the new rule applies to any remaining, unfiltered events only.
Set index
Use a set index rule to specify or change the destination index for an event routing to a Splunk destination. You can optionally filter the events that the rule applies to.
If this rule does not apply to a particular Splunk destination event, that event goes to the index otherwise designated for the event, either the default "main" index or an index specified through the available layered configurations in the Splunk configuration system, for example, through settings in inputs.conf
or outputs.conf
.
You can either specify a string for the destination index name, or you can set the index based on an eval expression, which allows you to conditionally route to different indexes.
The set index rule includes these settings:
Setting | Description |
---|---|
Condition | Optionally filter the events that follow the set index rule. |
Set index as | Set the index to a string value (for example, "my_index") or use an eval expression to determine the index name based on specified conditions. |
Route to Destination rule
Use a routing rule to select events, and split or duplicate them between one or more destinations.
This routing rule requires you to provide:
Setting | Description |
---|---|
Condition | Choose a method to match events for routing. Choose the regex or eval condition to select specific events, or none when you want all events sent to a destination. If a condition is set, only events matching the condition will be sent to the destination(s). |
Immediately send to | By default, the destination is "Default Destination". Any matching events are placed back into the Splunk Enterprise indexing queue for processing and indexing to a Splunk index, either on the local instance or on a downstream or associated instance, according to the deployment topology. For example, in the case of a heavy forwarder, the default destination is an index on the indexer at the end of the chain of forwarders. Similarly, in the case of an indexer cluster, the default destination is an index on the peer nodes. The destination index for each event is either the default index (main), the index determined by the configuration layers, if any, or an index determined by a set index rule. The destination rule also supports AWS S3 and other S3-compliant destinations. You must configure an S3 remote storage destination before using the destination in a "Route to Destination" rule. See Create an S3 destination. If more than one destination is chosen, a copy of any matching events is sent to all destinations chosen. |
Clone events and apply more rules | This toggle causes data ingest to create a clone of the event stream, applying the rules currently defined in the ruleset, and route the stream to the specified destination, while applying any additional newly defined rules against the event stream and routing that subset to a second specified destination, defined in a second Route to Destination rule. As with all rules, the ruleset must be saved and deployed before the destination rules start functioning. |
Data Preview for Final Destination
The last rule in every ruleset sends any remaining events along the ingestion pipeline to the indexer for indexing. The rule offers an estimate of the data volume that will be indexed.
If you use the "Route to Destination" rule in your ruleset, this rule might be skipped. For example, if a Route to Destination rule includes "Immediately send to: Splunk Index," the data stream is split at the routing rule, and the matching events are sent to be indexed. In that scenario, the Final Destination rule will display a 0Kb indexed data estimate, despite events being sent for indexing from the routing rule.
Create an S3 destination
To write events to a remote storage volume, select a preconfigured S3 destination when you configure the "Route to Destination" rule, You can write to multiple S3 destinations. The "Immediately send to" field has a typeahead capability that displays all preconfigured S3 destinations.
You must configure an S3 remote storage destination before using the destination in a "Route to Destination" rule.
You configure and validate S3 destinations through the Destinations tab on the Data Ingest page. Select New Destination and fill out the fields, following the examples provided there. You can create multiple S3 destinations.
The bucket you designate as the S3 remote storage destination must be used only by ingest actions. Do not share buckets with other tools such as SmartStore and edge processors.
You can create a maximum of eight S3 destinations. When rulesets route to a destination that is invalid or does not exist, the Splunk Platform instance blocks all queues and pipelines and does not drop data.
In the case of heavy forwarders managed through a deployment server, S3 destinations must be configured on each heavy forwarder individually, not on the deployment server.
Partition events
When creating an S3 destination, you can define a partitioning schema for events based on timestamp and optionally source type. The events then flow into a directory structure based on the schema.
Go to the "Partitioning" section of the New Destination configuration.. You can choose a partitioning schema through the drop-down menu. The choices are:
- Day (YYYY/MM/DD)
- Month (YYYY/MM)
- Year (YYYY)
- Legacy
The legacy setting is for use with pre-9.1 destinations only. With legacy, for each 2MB (by default) batch, the latest event timestamp in the batch identifies the folder using the format "YYYY/MM/DD". However, unlike the true partitioning options such as "day", the folder might also contain events with other timestamps, if its batch contains other timestamps.
In the case of destinations created pre-9.1, "legacy" is the default. In the case of destinations created in 9.1 and higher, "day" is the default.
You can also set source type as a secondary key. However, if you are using federated search for Amazon S3 with the AWS Glue Data Catalog integration, you need to make sure that your Glue Data Catalog tables do not include a duplicate entry for the sourcetype column.
For details on the partitioning methods and examples of the resulting paths, see the partitionBy
setting in outputs.conf
Use KMS encryption (Splunk Cloud Platform only)
You can employ SSE-KMS encryption when using ingest actions to write data to customer-owned S3 buckets. This capability is enabled through the configuration of AWS cross-account IAM roles.
Take note of the following critical points:
* You are assuming ownership and full responsibility for the integrity and ongoing availability of your AWS KMS key.
* The KMS key is required for encrypting Splunk data in real-time.
* Loss of access to the KMS key can result in service interruption and/or permanent loss of data access by all parties (AWS, Splunk, and you).
* Unauthorized access to the KMS key can result in accidental or explicit key operations (such as key deactivation or deletion) that could lead to service disruption or permanent loss of data access by all parties (AWS, Splunk and you).
* You must maintain Splunk privileged access to the KMS key via Splunk-mandated key policy definitions.
* Keys must be in the same region as their Splunk Cloud stack. Multi-region keys are not supported.
* Key aliases are not supported.
To enable KMS encryption, create the SplunkIngestActions IAM role in your AWS account:
- Go to the IAM roles section in the AWS configuration UI.
- Create the exact role "SplunkIngestActions".
- Edit the permissions section for that role by adding an inline policy and overwriting the existing JSON with JSON created through the Generate Permission Policy button in the Splunk ingest actions UI. You can edit that JSON text as needed for your organization.
- Edit the trust relationship section by overwrite the existing JSON with JSON created through the Generate Trust Policy button in the Splunk ingest actions UI. You can edit this JSON text as needed for your organization.
Perform advanced configurations with outputs.conf
While Destinations on the Data Ingest page can handle most common S3 configuration needs, for some advanced configurations, you might need to directly edit outputs.conf, using the rfs stanza.
For a complete list of rfs settings, see Remote File System (RFS) Output. The remote filesystem settings and options for S3 are similar to the SmartStore S3 configuration.
Troubleshoot
To troubleshoot the S3 remote file system, search the _internal
index for events from the RfsOutputProcessor and S3Client components. For example:
index="_internal" sourcetype="splunkd" (ERROR OR WARN) RfsOutputProcessor OR S3Client
Key provisos
Note the following:
- You can configure and use multiple S3 remote storage locations, up to a maximum of 8 destinations.
- In the case of a Splunk Cloud Platform deployment, buckets must be in the same region as the deployment.
- In the case of an indexer cluster, each remote storage configuration must be identical across the indexer cluster peers.
- AWS has an upload limit of 5 GB for single objects. An attempt to upload an object greater than 5 GB will result in data loss. You will only encounter this limit if you set
batchSizeThresholdKB
inoutputs.conf
to a value that is greater than 5 GB. - The remote file system creates buckets similar to index buckets on the remote storage location. The bucket names include the peer GUID and date.
- Remember to set the correct life cycle policies for your S3 buckets and their paths. This data will live forever by default unless removed.
- For information on S3 authentication requirements, see SmartStore on S3 security strategies in Managing Indexers and Clusters of Indexers. Ingest actions requirements are similar.
Output optimizations for federated search
Several output optimizations have been introduced in Splunk Enterprise 9.1 and Splunk Cloud 9.0.2303. The changes affect only new destinations.
These behaviors are:
- Events are delimited with a new line.
- Index-time fields are output automatically.
- Compression type is set to "gzip".
- Batch Size is set to 128 MB (131072 KB).
These settings are turned on by default but can be turned off in the UI.
In addition:
- A raw option is now available for JSON output. It gives you full flexibility to output events in whatever form you want.
- The ingest actions feature outputs a new default field "index". It only outputs the field if you explicitly set an index with the Set Index rule.
Deploy a ruleset on an indexer cluster
You can create a ruleset either on the cluster manager or on a connected search head, which proxies the request to the cluster manager. In either case, you must explicitly deploy the ruleset to the peer nodes.
When you save a ruleset, the system places the ruleset in an ingest-actions-specific app on the cluster manager. You will then be prompted to deploy the ruleset to the peer nodes. You can either deploy immediately, in response to the prompt, or later, through the configuration bundle method on the cluster manager.
Note the following:
- All rulesets are defined in the same app on the cluster manager node. The app path is:
$SPLUNK_HOME/etc/manager-apps/splunk_ingest_actions
- When you deploy the app with your ruleset, any other configuration bundle changes queued on the cluster manager node will also be deployed. This can include other rulesets that are saved, but might be incomplete.
Deploying a ruleset might cause a rolling restart, if there are other configuration changes queued on the cluster manager node that require a restart.
Interaction with TRANSFORMS
The RULESET
setting is similar in behavior to the TRANSFORMS
setting in props.conf. There are some additional considerations when using RULESET
:
- If a
TRANSFORMS
stanza and aRULESET
stanza apply to the same source type, theTRANSFORMS
is applied first. - A source type must be associated with just one
RULESET
configuration.
Create or modify rulesets only through the Ingest Actions page or the REST endpoint /services/data/ingest/rulesets
. Do not create or modify rulesets through the underlying .conf files.
Differences between RULESET and TRANSFORMS in the context of heavy forwarders
The RULESET
setting has a key difference in behavior from the TRANSFORMS
setting in the context of a heavy forwarder deployment:
- TRANSFORM settings are applied only at the initial, heavy forwarder layer of processing, and not again later with downstream heavy forwarders or indexers.
- RULESET settings can be applied at every layer of processing. For example, a heavy forwarder can apply a ruleset and then stream the data to an indexer with its own ruleset for that data. In that case, both the heavy forwarder's and the indexer's rulesets will be applied to the data in turn. Similarly, if a heavy forwarder streams data to a second heavy forwarder, which then streams the data onward to the indexer, all three processing layers can apply their own rulesets to the data.
Use persistent queues to help prevent data loss | Improving data ingestion using the Edge Processor solution |
This documentation applies to the following versions of Splunk® Enterprise: 9.2.0, 9.2.1, 9.2.2, 9.2.3, 9.2.4
Feedback submitted, thanks!