Filter and mask data using an Edge Processor

You can create a pipeline that filters and masks the incoming data so that only a desired subset of that data gets sent to a destination.

Configuring a pipeline to filter and mask data involves doing the following:

Extracting the data values that you want to filter into event fields, if those values are not already stored in fields.
Defining the filtering criteria.
Specifying a regular expression that selects and masks data.

As a best practice for preventing unwanted data loss, make sure to always have a default destination for your Edge Processors. Otherwise, all unprocessed data is dropped. See Add an Edge Processor.

Prerequisites

Before starting to create a pipeline, confirm the following:

The source type of the data that you want the pipeline to process is listed on the Source types page of your tenant. If your source type is not listed, then you must add that source type to your tenant and configure event breaking and merging definitions for it. See Add source types for Edge Processors for more information.
The destination that you want the pipeline to send data to is listed on the Destinations page of your tenant. If your destination is not listed, then you must add that destination to your tenant. See Add or manage destinations for more information.

Steps

Perform the following steps to create a pipeline that filters and masks data:

Create a pipeline

Navigate to the Pipelines page and then select New pipeline.
Select Blank pipeline and then select Next.
Specify a subset of the data received by the Edge Processor for this pipeline to process. If you want to use the sample data given in step 4 so that you can follow along with the example configurations described in later sections of this page, skip this step. To define a partition, complete these steps:

Select the plus icon () next to Partition or select the option that matches how you would like to create your partition in the Suggestions section.
Select host, source, or sourcetype in the Field field.
Select an operator in the Operator field.
Enter the value that your partition should filter by to create the subset in the Value field. Then select Apply.
Once you have defined your partition, select Next.

(Optional) Enter or upload sample data for generating previews that show how your pipeline processes data. This step is typically optional; however, if you plan to configure any field extractions, best practice is to provide sample data and generate pipeline previews so that you can verify the results of the field extraction before applying the configuration to your pipeline.
The sample data must be in the same format as the actual data that you want to process. See Getting sample data for previewing data transformations for more information.

If you want to follow the configuration examples in the next section, then enter the following sample events, which represent three fictitious purchases made at a store:
```
E9FF471F36A91031FE5B6D6228674089,72E0B04464AD6513F6A613AABB04E701,Credit Card,7.7,2018-01-13 04:41:00,2018-01-13 04:45:00,-73.997292,40.720982,4532038713619608
A5D125F5550BE7822FC6EE156E37733A,08DB3F9FCF01530D6F7E70EB88C3AE5B,Credit Card,14,2018-01-13 04:37:00,2018-01-13 04:47:00,-73.966843,40.756741,4539385381557252
1E65B7E2D1297CF3B2CA87888C05FE43,F9ABCCCC4483152C248634ADE2435CF0,Game Card,16.5,2018-01-13 04:26:00,2018-01-13 04:46:00,-73.956451,40.771442
```
Select Next to confirm your sample data.
Select the name of the destination that you want to send data to. Then, do the following:

If you selected a Splunk platform S2S or Splunk platform HEC destination, select Next.
If you selected another type of destination, select Done and skip the next step.

(Optional) If you're sending data to a Splunk platform deployment, you can specify a target index:

In the Index name field, select the name of the index that you want to send your data to.
(Optional) In some cases, incoming data already specifies a target index. If you want your Index name selection to override previous target index settings, then select the Overwrite previously specified target index check box.
Select Done.

Be aware that the destination index is determined by a precedence order of configurations. See How does an Edge Processor know which index to send data to? for more information.

You now have a simple pipeline that receives data and sends that data to a destination. In the next section, you'll configure this pipeline to do some additional filtering and masking of your data.

Configure filtering and masking in your pipeline

During the previous step, you created a basic pipeline that receives data that is associated with a specific source type and then sends it to a destination. The next step is to configure the pipeline to do the following:

Extract data values into fields so that these values can be used in filtering criteria.
Filter the incoming data.
Mask confidential information in the data.

When configuring field extractions and masking, you use regular expressions to select the data values that you want to extract or mask. You also have the option of using regular expressions to define your filtering criteria. Be aware that you must write these regular expressions using Regular Expression 2 (RE2) syntax. See Regular expression syntax for Edge Processor pipelines for more information.

The examples in the instructions that follow are based on the sample events described in the Create a pipeline section.

If the data values that you want to filter are not stored in event fields, then extract them into fields. The following steps describe how to extract the Credit Card and Game Card values from the sample data into a field named card_type. For more general field extraction instructions, see Extract fields from event data using an Edge Processor.
1. Select the Preview Pipeline icon () to generate a preview that shows what the sample data looks like when it passes through the pipeline.
2. Select the plus icon () in the Actions section, then select Extract fields from _raw.
3. In the Regular expression field, enter the following:
```
(?P<card_type>(Credit Card|Game Card))
```
4. Select Apply to perform the field extraction and close the Extract fields from _raw dialog box.
The pipeline editor adds a rex command to the SPL2 statement of your pipeline. This rex command performs the field extraction. For more information, see rex command overview in the SPL2 Search Reference. In the preview results panel, the data now includes an additional column named card_type.
To filter your data, do the following:
1. Select the plus icon () in the Actions section, then select Filter values.
2. In the Add filter dialog box, define your filtering criteria. For example, to filter the sample data and retain only events that have Credit Card as the card_type value, use these settings:
  
  Option name Enter or select the following
  
  Field card_type
  
  Action Keep
  
  Operator = equals
  
  Value Credit Card
3. Select Apply to filter your data and close the Add filter dialog box.

The pipeline editor adds a where command to the SPL2 statement of your pipeline. This where command performs the filtering and drops the excluded data. For more information, see where command overview in the SPL2 Search Reference. In the preview results panel, the data now includes only events that have Credit Card as the card_type value.

To mask your data, do the following:

Select the plus icon () in the Actions section, then select Mask values in _raw.

In the Mask using regular expression dialog box, define your masking criteria. For example, to mask the credit card numbers in the sample data by replacing them with the word "<redacted>", use these settings:

Option name	Enter or select the following
Field	_raw
Matching regular expression	[1-5][0-9]{15}
Replace with	<redacted>
Match case	This option is not used when matching numbers, so you don't need to do anything with it.

Select Apply to mask the data and close the Mask using regular expression dialog box.

The pipeline editor adds an eval command to the SPL2 statement of your pipeline. This eval command uses the replace function to replace the data values matched by the regular expression with the word "<redacted>". For more information, see eval command overview and replace in the SPL2 Search Reference. In the preview results panel, the data in the _raw column no longer displays credit card numbers.

(Optional) If you selected a Splunk platform destination for your pipeline, you can specify the index that you want the Edge Processor to send your data to. To choose a specific destination index for your data, do the following:
1. In your Splunk platform deployment, verify that the index exists and that you have access to the index.
2. In the SPL2 statement of the pipeline, add an eval command specifying the index name after the where command but before the into $destination command. For example:
```
$pipeline = | from $source | rex field=_raw /(?P<card_type>(Credit Card|Game Card))/
| where card_type = "Credit Card"
| eval _raw=replace(_raw, /[1-5][0-9]{15}/i, "<redacted>")
| eval index="<index name>"
| into $destination;
```

You now have a pipeline that filters and masks data. In the next section, you'll save this pipeline and apply it to an Edge Processor.

Save and apply your pipeline

To save your pipeline, do the following:
1. Select Save pipeline.
2. In the Name field, enter a name for your pipeline.
3. (Optional) In the Description field, enter a description for your pipeline.
4. Select Save.
The pipeline is now listed on the Pipelines page, and you can apply it to Edge Processors as needed.
To apply this pipeline to an Edge Processor, do the following:
1. Navigate to the Pipelines page.
2. In the row that lists your pipeline, select the Actions icon () and then select Apply/Remove.
3. Select the Edge Processors that you want to apply the pipeline to, and then select Save.
You can only apply pipelines to Edge Processors that are in the Healthy status.

It can take a few minutes for the Edge Processor service to finish applying your pipeline to an Edge Processor. During this time, the affected Edge Processors enter the Pending status. To confirm that the process completed successfully, do the following:
- Navigate to the Edge Processors page. Then, verify that the Instance health column for the affected Edge Processors shows that all instances are back in the Healthy status.
- Navigate to the Pipelines page. Then, verify that the Applied column for the pipeline contains a The pipeline is applied icon ().

The Edge Processor that you applied the pipeline to can now filter and mask the data that it receives so that only a desired subset of that data gets sent to the destination specified in the pipeline. For information on how to confirm that your data is being processed and routed as expected, see Verify your Edge Processor and pipeline configurations.

Option name	Enter or select the following
Field	card_type
Action	Keep
Operator	= equals
Value	Credit Card

Use Edge Processors

Filter and mask data using an Edge Processor

Prerequisites

Steps

Create a pipeline

Configure filtering and masking in your pipeline

Save and apply your pipeline

Comments

Filter and mask data using an Edge Processor