Filter and mask data using an Edge Processor
You can create a pipeline that filters and masks the incoming data so that only a desired subset of that data gets sent to a destination.
Configuring a pipeline to filter and mask data involves doing the following:
- Extracting the data values that you want to filter into event fields, if those values are not already stored in fields.
- Defining the filtering criteria.
- Specifying a regular expression that selects and masks data.
As a best practice for preventing unwanted data loss, make sure to always have a default destination for your Edge Processors. Otherwise, all unprocessed data is dropped. See Add an Edge Processor.
Prerequisites
Before starting to create a pipeline, confirm the following:
- The source type of the data that you want the pipeline to process is listed on the Source types page of your tenant. If your source type is not listed, then you must add that source type to your tenant and configure event breaking and merging definitions for it. See Add source types for Edge Processors for more information.
- The destination that you want the pipeline to send data to is listed on the Destinations page of your tenant. If your destination is not listed, then you must add that destination to your tenant. See Add or manage destinations for more information.
Steps
Perform the following steps to create a pipeline that filters and masks data:
Create a pipeline
- Navigate to the Pipelines page and then select New pipeline.
- Select Blank pipeline and then select Next.
- Specify a subset of the data received by the Edge Processor for this pipeline to process. To do this, you must define a partition by completing these steps:
- Select the plus icon () next to Partition or select the option that matches how you would like to create your partition in the Suggestions section.
- In the Field field, specify the event field that you want the partitioning condition to be based on.
- To specify whether the pipeline includes or excludes the data that meets the criteria, select Keep or Remove.
- In the Operator field, select an operator for the partitioning condition.
- In the Value field, enter the value that your partition should filter by to create the subset. Then select Apply. You can create as many conditions for a partition in a pipeline by selecting the plus icon ().
- Once you have defined your partition, select Next.
- (Optional) Enter or upload sample data for generating previews that show how your pipeline processes data. This step is typically optional; however, if you plan to configure any field extractions, best practice is to provide sample data and generate pipeline previews so that you can verify the results of the field extraction before applying the configuration to your pipeline.
The sample data must be in the same format as the actual data that you want to process. See Getting sample data for previewing data transformations for more information.
If you want to follow the configuration examples in the next section, then enter the following sample events, which represent three fictitious purchases made at a store:
E9FF471F36A91031FE5B6D6228674089,72E0B04464AD6513F6A613AABB04E701,Credit Card,7.7,2018-01-13 04:41:00,2018-01-13 04:45:00,-73.997292,40.720982,4532038713619608 A5D125F5550BE7822FC6EE156E37733A,08DB3F9FCF01530D6F7E70EB88C3AE5B,Credit Card,14,2018-01-13 04:37:00,2018-01-13 04:47:00,-73.966843,40.756741,4539385381557252 1E65B7E2D1297CF3B2CA87888C05FE43,F9ABCCCC4483152C248634ADE2435CF0,Game Card,16.5,2018-01-13 04:26:00,2018-01-13 04:46:00,-73.956451,40.771442
- Select Next to confirm your sample data.
- Select the name of the destination that you want to send data to.
- (Optional) If you selected a Splunk platform S2S or Splunk platform HEC destination, you can configure index routing:
- Select one of the following options in the expanded destinations panel:
Option Description Default The pipeline does not route events to a specific index.
If the event metadata already specifies an index, then the event is sent to that index. Otherwise, the event is sent to the default index of the Splunk platform deployment.Specify index for events with no index The pipeline only routes events to your specified index if the event metadata did not already specify an index. Specify index for all events The pipeline routes all events to your specified index. - If you selected Specify index for events with no index or Specify index for all events, then in the Index name field, select or enter the name of the index that you want to send your data to.
Be aware that the destination index is determined by a precedence order of configurations. See How does an Edge Processor know which index to send data to? for more information.
- Select one of the following options in the expanded destinations panel:
- Select Done to confirm the data destination.
You now have a simple pipeline that receives data and sends that data to a destination. In the next section, you'll configure this pipeline to do some additional filtering and masking of your data.
Configure filtering and masking in your pipeline
During the previous step, you created a basic pipeline that receives data that is associated with a specific source type and then sends it to a destination. The next step is to configure the pipeline to do the following:
- Extract data values into fields so that these values can be used in filtering criteria.
- Filter the incoming data.
- Mask confidential information in the data.
When configuring field extractions and masking, you use regular expressions to select the data values that you want to extract or mask. You also can use regular expressions to define your filtering criteria. Be aware that you must write these regular expressions using Regular Expression 2 (RE2) syntax. See Regular expression syntax for Edge Processor pipelines for more information.
The examples in the instructions that follow are based on the sample events described in the Create a pipeline section.
- If the data values that you want to filter are not stored in event fields, then extract them into fields. The following steps describe how to extract the
Credit Card
andGame Card
values from the sample data into a field namedcard_type
. For more general field extraction instructions, see Extract fields from event data using an Edge Processor.- Select the Preview Pipeline icon () to generate a preview that shows what the sample data looks like when it passes through the pipeline.
- Select the plus icon () in the Actions section, then select Extract fields from _raw.
- In the Regular expression field, enter the following:
(?P<card_type>(Credit Card|Game Card))
- Select Apply to perform the field extraction and close the Extract fields from _raw dialog box.
The pipeline editor adds a
rex
command to the SPL2 statement of your pipeline. Thisrex
command performs the field extraction. For more information, see rex command overview in the SPL2 Search Reference. In the preview results panel, the data now includes an additional column namedcard_type
. - To filter your data, do the following:
- Select the plus icon () in the Actions section, then select Filter values.
- In the Add filter dialog box, define your filtering criteria. For example, to filter the sample data and retain only events that have
Credit Card
as thecard_type
value, use these settings:Option name Enter or select the following Field card_type Action Keep Operator = equals Value Credit Card - Select Apply to filter your data and close the Add filter dialog box.
- To mask your data, do the following:
- Select the plus icon () in the Actions section, then select Mask values in _raw.
- In the Mask using regular expression dialog box, define your masking criteria. For example, to mask the credit card numbers in the sample data by replacing them with the word "<redacted>", use these settings:
Option name Enter or select the following Field _raw Matching regular expression [1-5][0-9]{15} Replace with <redacted> Match case This option is not used when matching numbers, so you don't need to do anything with it. - Select Apply to mask the data and close the Mask using regular expression dialog box.
The pipeline editor adds an
eval
command to the SPL2 statement of your pipeline. Thiseval
command uses thereplace
function to replace the data values matched by the regular expression with the word "<redacted>". For more information, see eval command overview and replace in the SPL2 Search Reference. In the preview results panel, the data in the_raw
column no longer displays credit card numbers.
The pipeline editor adds a where
command to the SPL2 statement of your pipeline. This where
command performs the filtering and drops the excluded data. For more information, see where command overview in the SPL2 Search Reference. In the preview results panel, the data now includes only events that have Credit Card
as the card_type
value.
You now have a pipeline that filters and masks data. The complete SPL2 statement of your pipeline looks like this:
$pipeline = | from $source | rex field=_raw /(?P<card_type>(Credit Card|Game Card))/ | where card_type = "Credit Card" | eval _raw=replace(_raw, /[1-5][0-9]{15}/i, "<redacted>") | into $destination;
In the next section, you'll save this pipeline and apply it to an Edge Processor.
Save and apply your pipeline
- To save your pipeline, do the following:
- Select Save pipeline.
- In the Name field, enter a name for your pipeline.
- (Optional) In the Description field, enter a description for your pipeline.
- Select Save.
The pipeline is now listed on the Pipelines page, and you can apply it to Edge Processors as needed.
- To apply this pipeline to an Edge Processor, do the following:
- Navigate to the Pipelines page.
- In the row that lists your pipeline, select the Actions icon () and then select Apply/Remove.
- Select the Edge Processors that you want to apply the pipeline to, and then select Save.
You can only apply pipelines to Edge Processors that are in the Healthy status.
It can take a few minutes for the Edge Processor service to finish applying your pipeline to an Edge Processor. During this time, the affected Edge Processors enter the Pending status. To confirm that the process completed successfully, do the following:
- Navigate to the Edge Processors page. Then, verify that the Instance health column for the affected Edge Processors shows that all instances are back in the Healthy status.
- Navigate to the Pipelines page. Then, verify that the Applied column for the pipeline contains a The pipeline is applied icon ().
The Edge Processor that you applied the pipeline to can now filter and mask the data that it receives so that only a desired subset of that data gets sent to the destination specified in the pipeline. For information on how to confirm that your data is being processed and routed as expected, see Verify your Edge Processor and pipeline configurations.
Getting sample data for previewing data transformations | Hash fields using an Edge Processor |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release), 9.3.2408
Feedback submitted, thanks!