Filter and mask data using Ingest Processor

You can create a pipeline that filters and masks the incoming data so that only a desired subset of that data gets sent to a destination.

Creating a filtering and masking pipeline involves doing the following:

Specifying the partition of the incoming data that the pipeline receives from and specifying a destination that the pipeline sends data to.
Defining the filtering criteria by including a where command in the SPL2 statement of the pipeline. See where command overview in the SPL2 Search Reference for more information.
Specifying a regular expression that selects and masks data by including a rex command in the SPL2 statement of the pipeline. See the following pages for more information:
- Regular expression syntax for Ingest Processor pipelines
- rex command overview in the SPL2 Search Reference

As a best practice for preventing unwanted data loss, make sure to always have a default destination for your Ingest Processor. Otherwise, all unprocessed data is dropped. See Partitions for more information on what qualifies as unprocessed data.

Reference

To help you get started on creating and using pipelines, the Ingest Processor solution includes sample pipelines called templates. Templates are Splunk-built pipelines that are designed to work with specific data sources and use cases, such as filtering and masking. Templates include sample data and preconfigured SPL2 statements, so you can use them as a starting point to build custom pipelines to solve specific use cases or as a reference to learn how to write SPL2 to build pipelines. To view a list of the available pipeline templates, log in to your tenant, navigate to the Pipelines page, and then select Templates.

See Use templates to create pipelines for Ingest Processor for instructions on how to build a pipeline from a template.

Prerequisites

Before starting to create a pipeline, confirm the following:

If you want to partition your data by source type, the source type of the data that you want the pipeline to process is listed on the Source types page of your tenant. If your source type is not listed, then you must add that source type to your tenant and configure event breaking and merging definitions for it.
The destination that you want the pipeline to send data to is listed on the Destinations page of your tenant. If your destination is not listed, then you must add that destination to your tenant.

Steps

Perform the following steps to create a pipeline that filters and masks data:

Create a pipeline

Complete these steps to create a pipeline that receives data associated with a specific source type, optionally processes it, and sends that data to a destination.

Navigate to the Pipelines page, then select New pipeline and then Ingest Processor pipeline.
On the Get started page, select Blank pipeline and then Next.
On the Define your pipeline's partition page, do the following:

Select how you want to partition your incoming data that you want to send to your pipeline. You can partition by source type, source, and host.
Enter the conditions for your partition, including the operator and the value. Your pipeline will receive and process the incoming data that meets these conditions.
Select Next to confirm the pipeline partition.

On the Add sample data page, do the following:

Enter or upload sample data for generating previews that show how your pipeline processes data. The sample data must contain accurate examples of the values that you want to extract into fields. For example, the following sample events represent purchases made at a store at a particular time:

E9FF471F36A91031FE5B6D6228674089, 72E0B04464AD6513F6A613AABB04E701, Credit Card, 7.7, 2023-01-13 04:41:00, 2023-01-13 04:45:00, -73.997292, 40.720982, 4532038713619608
A5D125F5550BE7822FC6EE156E37733A, 08DB3F9FCF01530D6F7E70EB88C3AE5B, Credit Card,14, 2023-01-13 04:37:00, 2023-01-13 04:47:00, -73.966843,40.756741, 4539385381557252
1E65B7E2D1297CF3B2CA87888C05FE43,F9ABCCCC4483152C248634ADE2435CF0, Game Card, 16.5, 2023-01-13 04:26:00, 2023-01-13 04:46:00, -73.956451, 40.771442

Select Next to confirm the sample data that you want to use for your pipeline.

On the Select a metrics destination page, select the name of the destination that you want to send metrics to.
(Optional) If you selected Splunk Metrics store as your metrics destination, specify the name of the target metrics index where you want to send your metrics.
On the Select a data destination page, select the name of the destination that you want to send logs to.

(Optional) If you selected a Splunk platform destination, you can configure index routing:

Select one of the following options in the expanded destinations panel:

Option	Description
Default	The pipeline does not route events to a specific index. If the event metadata already specifies an index, then the event is sent to that index. Otherwise, the event is sent to the default index of the Splunk Cloud Platform deployment.
Specify index for events with no index	The pipeline only routes events to your specified index if the event metadata did not already specify an index.
Specify index for all events	The pipeline routes all events to your specified index.

If you selected Specify index for events with no index or Specify index for all events, then from the Index name drop-down list, select the name of the index that you want to send your data to.
If your desired index is not available in the drop-down list, then confirm that the index is configured to be available to the tenant and then refresh the connection between the tenant and the Splunk Cloud Platform deployment. For detailed instructions, see Make more indexes available to the tenant.

If you're sending data to a Splunk Cloud Platform deployment, be aware that the destination index is determined by a precedence order of configurations. See How does Ingest Processor know which index to send data to? for more information

Select Done to confirm the data destination.

You now have a simple pipeline that receives data for a specific source type and sends that data to a destination. In the next section, you'll configure this pipeline to filter and mask your data.

Configure filtering and masking in your pipeline

During the previous step, you created a basic pipeline that receives data that is associated with a specific partition and then sends it to a destination. For example, if you added the custom "Buttercup_Games" source type, then the pipeline will filter the data that it receives to only the data with the "Buttercup_Games" source type.

The next step is to configure the pipeline to do the following:

Extract data values into fields so that these values can be used in filtering criteria.
Filter the incoming data.
Mask confidential information in the data.

When configuring field extractions and masking, you use regular expressions to select the data values that you want to extract or mask. You also can use regular expressions to define your filtering criteria. Be aware that you must write regular expressions using Perl Compatible Regular Expression 2 (PCRE) syntax. See Regular expression syntax for Ingest Processor pipelines for more information.

The examples in the instructions that follow are based on the sample events described in the Create a pipeline section.

If the data values that you want to filter are not stored in event fields, then extract them into fields. The following steps describe how to extract the Credit Card and Game Card values from the sample data into a field named card_type. For more general field extraction instructions, see Extract fields from event data using Ingest Processor.

Select the Preview Pipeline icon () to generate a preview that shows what the sample data looks like when it passes through the pipeline.
In the Actions section, select the plus icon (), then select Extract fields from _raw.
In the Regular expression field, enter the following:
```
(?P<card_type>(Credit Card|Game Card))
```
Select Apply to perform the field extraction and close the Extract fields from _raw dialog box.

The pipeline editor adds a rex command to the SPL2 statement of your pipeline. This rex command performs the field extraction. For more information, see rex command overview in the SPL2 Search Reference. In the preview results panel, the data now includes an additional column named card_type.

To filter your data, do the following:

Select the plus icon () in the Actions section, then select Filter values.
In the Add filter dialog box, define your filtering criteria. For example, to filter the sample data and retain only events that have Credit Card as the card_type value, use these settings:

Option name Enter or select the following

Field card_type

Action Keep

Operator = equals

Value Credit Card
Select Apply to filter your data and close the Add filter dialog box.

The pipeline editor adds a where command to the SPL2 statement of your pipeline. This where command performs the filtering and drops the excluded data. For more information, see where command overview in the SPL2 Search Reference. In the preview results panel, the data now includes only events that have Credit Card as the card_type value. To mask your data, do the following:

Select the plus icon () in the Actions section, then select Mask values in_raw.

In the Mask using regular expression dialog box, define your masking criteria. For example, to mask the credit card numbers in the sample data by replacing them with the word "<redacted>", use these settings:

Option name	Enter or select the following
Field	_raw
Matching regular expression	[1-5][0-9]{15}
Replace with	<redacted>
Match case	This option is not used when matching numbers, so you don't need to do anything with it.

Select Apply to mask the data and close the Mask using regular expression dialog box.

The pipeline editor adds an eval command to the SPL2 statement of your pipeline. This eval command uses the replace function to replace the data values matched by the regular expression with the word "<redacted>". For more information, see eval command overview and replace in the SPL2 Search Reference. In the preview results panel, the data in the _raw column no longer displays credit card numbers.

You now have a pipeline that filters and masks data. The complete SPL2 statement of your pipeline looks like this:

$pipeline = | from $source 
| rex field=_raw /(?P<card_type>(Credit Card|Game Card))/
| where card_type = "Credit Card"
| eval _raw=replace(_raw, /[1-5][0-9]{15}/i, "<redacted>")
| into $destination;

In the next section, you'll verify that this pipeline processes data in the way that you expect and save it to be applied to the Ingest Processor.

Preview, save, and apply your pipeline

(Optional) To see a preview of how your pipeline processes data based on the sample data that you provided while selecting your source type, select the Preview Pipeline icon ().
To save your pipeline, do the following:

Select Save pipeline.
In the Name field, enter a name for your pipeline.
(Optional) In the Description field, enter a description for your pipeline.
Select Save.

The pipeline is now listed on the Pipelines page, and you can apply it as needed.

To apply this pipeline, do the following:

Navigate to the Pipelines page.
In the row that lists your pipeline, select the Actions icon () and then select Apply.
Select the pipeline that you want to apply, and then select Save.

It can take a few minutes for the Ingest Processor service to finish applying your pipeline. During this time, all applied pipelines enter the Pending status. Once the operation is complete, the Pending Apply status icon () stops displaying beside the pipeline. Refresh your browser to check if the icon no longer displays.

Your applied pipeline can now filter and mask the data that it receives so that only a desired subset of that data gets sent to the destination specified in the pipeline. Any data that does not meet your where command conditions will be dropped from your pipeline.

Related answers from Splunk Community

Filter and mask data using Ingest Processor

Reference

Prerequisites

Steps

Create a pipeline

Configure filtering and masking in your pipeline

Preview, save, and apply your pipeline

Comments

Filter and mask data using Ingest Processor

Was this topic useful?