
Filter and mask data using an Edge Processor
You can create a pipeline that filters and masks the incoming data so that only a desired subset of that data gets sent to a destination.
Configuring a pipeline to filter and mask data involves doing the following:
- Extracting the data values that you want to filter into event fields, if those values are not already stored in fields.
- Defining the filtering criteria.
- Specifying a regular expression that selects and masks data.
As a best practice for preventing unwanted data loss, make sure to always have a default destination for your Edge Processors. Otherwise, all unprocessed data is dropped. See Add an Edge Processor.
Prerequisites
Before starting to create a pipeline, confirm the following:
- The source type of the data that you want the pipeline to process is listed on the Source types page of your tenant. If your source type is not listed, then you must add that source type to your tenant and configure event breaking and merging definitions for it. See Add source types for Edge Processors for more information.
- The destination that you want the pipeline to send data to is listed on the Destinations page of your tenant. If your destination is not listed, then you must add that destination to your tenant. See Add or manage destinations for more information.
Steps
Perform the following steps to create a pipeline that filters and masks data:
Create a pipeline
- Navigate to the Pipelines page and then select New pipeline.
- Select Blank pipeline and then select Next.
- Select or enter a sourcetype to define the subset of data you want this pipeline to process. If you want to use the sample data given in step 5 so that you can follow along with the example configurations described in later sections of this page, leave the sourcetype field empty.
- Select Next to confirm your partition.
- (Optional) Enter or upload sample data for generating previews that show how your pipeline processes data. This step is typically optional; however, if you plan to configure any field extractions, best practice is to provide sample data and generate pipeline previews so that you can verify the results of the field extraction before applying the configuration to your pipeline.
The sample data must be in the same format as the actual data that you want to process. See Getting sample data for previewing data transformations for more information.
If you want to follow the configuration examples in the next section, then enter the following sample events, which represent three fictitious purchases made at a store:
E9FF471F36A91031FE5B6D6228674089,72E0B04464AD6513F6A613AABB04E701,Credit Card,7.7,2018-01-13 04:41:00,2018-01-13 04:45:00,-73.997292,40.720982,4532038713619608 A5D125F5550BE7822FC6EE156E37733A,08DB3F9FCF01530D6F7E70EB88C3AE5B,Credit Card,14,2018-01-13 04:37:00,2018-01-13 04:47:00,-73.966843,40.756741,4539385381557252 1E65B7E2D1297CF3B2CA87888C05FE43,F9ABCCCC4483152C248634ADE2435CF0,Game Card,16.5,2018-01-13 04:26:00,2018-01-13 04:46:00,-73.956451,40.771442
- Select Next to confirm your sample data.
- Select the name of the destination that you want to send data to, and then select Done.
If you're sending data to a Splunk platform deployment, be aware that the destination index is determined by a precedence order of configurations. See How does an Edge Processor know which index to send data to? for more information.
You now have a simple pipeline that receives data and sends that data to a destination. In the next section, you'll configure this pipeline to do some additional filtering and masking of your data.
Configure filtering and masking in your pipeline
During the previous step, you created a basic pipeline that receives data that is associated with a specific source type and then sends it to a destination. The next step is to configure the pipeline to do the following:
- Extract data values into fields so that these values can be used in filtering criteria.
- Filter the incoming data.
- Mask confidential information in the data.
When configuring field extractions and masking, you use regular expressions to select the data values that you want to extract or mask. You must write these regular expressions using Regular Expression 2 (RE2) syntax. See Regular expression syntax for Edge Processor pipelines for more information.
The examples in the instructions that follow are based on the sample events described in the Create a pipeline section.
- If the data values that you want to filter are not stored in event fields, then extract them into fields. The following steps describe how to extract the
Credit Card
andGame Card
values from the sample data into a field namedcard_type
. For more general field extraction instructions, see Extract fields from event data using an Edge Processor.- Select the Preview Pipeline icon (
) to generate a preview that shows what the sample data looks like when it passes through the pipeline.
- In the preview results panel, hover over the header of the
_raw
field to make the Options icon () appear. Select that icon to open the Options menu and then select Extract fields from _raw.
- In the Regular expression field, enter the following:
(?P<card_type>(Credit Card|Game Card))
- Select Apply to perform the field extraction and close the Extract fields from _raw dialog box.
The pipeline editor adds a
rex
command to the SPL2 statement of your pipeline. Thisrex
command performs the field extraction. For more information, see rex command overview in the SPL2 Search Reference. In the preview results panel, the data now includes an additional column namedcard_type
. - Select the Preview Pipeline icon (
- To filter your data, do the following:
- From the Suggestions area, select Filter values.
- In the Add filter dialog box, define your filtering criteria. For example, to filter the sample data and retain only events that have
Credit Card
as thecard_type
value, use these settings:Option name Enter or select the following Field card_type Action Keep Operator = equals Value Credit Card - Select Apply to filter the data and close the Add filter dialog box.
The pipeline editor adds a
where
command to the SPL2 statement of your pipeline. Thiswhere
command performs the filtering. For more information, see where command overview in the SPL2 Search Reference. In the preview results panel, the data now includes only events that haveCredit Card
as thecard_type
value. - To mask your data, do the following:
- In the preview results panel, hover over the header of the field containing the data that you want to mask. Select the Options icon (
) that appears, and then select Mask values in <field name>.
- In the Mask using regular expression dialog box, define your masking criteria. For example, to mask the credit card numbers in the sample data by replacing them with the word "<redacted>", use these settings:
Option name Enter or select the following Field _raw Matching regular expression [1-5][0-9]{15} Replace with <redacted> Match case This option is not used when matching numbers, so you don't need to do anything with it. - Select Apply to mask the data and close the Mask using regular expression dialog box.
The pipeline editor adds an
eval
command to the SPL2 statement of your pipeline. Thiseval
command uses thereplace
function to replace the data values matched by the regular expression with the word "<redacted>". For more information, see eval command overview and replace in the SPL2 Search Reference. In the preview results panel, the data in the_raw
column no longer displays credit card numbers. - In the preview results panel, hover over the header of the field containing the data that you want to mask. Select the Options icon (
- (Optional) If you selected a Splunk platform destination for your pipeline, you can specify the index that you want the Edge Processor to send your data to. To choose a specific destination index for your data, do the following:
- In your Splunk platform deployment, verify that the index exists and that you have access to the index.
- In the SPL2 statement of the pipeline, add an
eval
command specifying the index name after thewhere
command but before theinto $destination
command. For example:$pipeline = | from $source | rex field=_raw /(?P<card_type>(Credit Card|Game Card))/ | where card_type = "Credit Card" | eval _raw=replace(_raw, /[1-5][0-9]{15}/i, "<redacted>") | eval index="<index name>" | into $destination;
See How does an Edge Processor know which index to send data to? for more information.
You now have a pipeline that filters and masks data. In the next section, you'll save this pipeline and apply it to an Edge Processor.
Save and apply your pipeline
- To save your pipeline, do the following:
- Select Save pipeline.
- In the Name field, enter a name for your pipeline.
- (Optional) In the Description field, enter a description for your pipeline.
- Select Save.
The pipeline is now listed on the Pipelines page, and you can apply it to Edge Processors as needed.
- To apply this pipeline to an Edge Processor, do the following:
- Navigate to the Pipelines page.
- In the row that lists your pipeline, select the Actions icon (
) and then select Apply/Remove.
- Select the Edge Processors that you want to apply the pipeline to, and then select Save.
You can only apply pipelines to Edge Processors that are in the Healthy status.
It can take a few minutes for the Edge Processor service to finish applying your pipeline to an Edge Processor. During this time, the affected Edge Processors enter the Pending status. To confirm that the process completed successfully, do the following:
- Navigate to the Edge Processors page. Then, verify that the Instance health column for the affected Edge Processors shows that all instances are back in the Healthy status.
- Navigate to the Pipelines page. Then, verify that the Applied column for the pipeline contains a The pipeline is applied icon (
).
The Edge Processor that you applied the pipeline to can now filter and mask the data that it receives so that only a desired subset of that data gets sent to the destination specified in the pipeline. For information on how to confirm that your data is being processed and routed as expected, see Verify your Edge Processor and pipeline configurations.
PREVIOUS Route internal logs from forwarders using an Edge Processor |
NEXT Extract fields from event data using an Edge Processor |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.0.2209, 9.0.2303, 9.0.2305 (latest FedRAMP release)
Feedback submitted, thanks!