Ingest Processor is currently released as a preview only and is not officially supported. See Splunk General Terms for more information. For any questions on this preview, please reach out to ingestprocessor@splunk.com.
Filter and mask data using Ingest Processor
You can create a pipeline that filters and masks the incoming data so that only a desired subset of that data gets sent to a destination.
Creating a filtering and masking pipeline involves doing the following:
- Specifying the partition of the incoming data that the pipeline receives from and specifying a destination that the pipeline sends data to.
- Defining the filtering criteria by including a
where
command in the SPL2 statement of the pipeline. See where command overview in the SPL2 Search Reference for more information. - Specifying a regular expression that selects and masks data by including a rex command in the SPL2 statement of the pipeline. See the following pages for more information:
As a best practice for preventing unwanted data loss, make sure to always have a default destination for your Ingest Processor. Otherwise, all unprocessed data is dropped. See Partitions for more information on what qualifies as unprocessed data.
Prerequisites
Before starting to create a pipeline, confirm the following:
- If you want to partition your data by source type, the source type of the data that you want the pipeline to process is listed on the Source types page of your tenant. If your source type is not listed, then you must add that source type to your tenant and configure event breaking and merging definitions for it.
- The destination that you want the pipeline to send data to is listed on the Destinations page of your tenant. If your destination is not listed, then you must add that destination to your tenant.
Steps
Perform the following steps to create a pipeline that filters and masks data:
- Create a pipeline.
- Configure filtering and masking in your pipeline.
- Preview, save, and apply your pipeline.
Create a pipeline
Complete these steps to create a pipeline that receives data associated with a specific source type, optionally processes it, and sends that data to a destination.
- Navigate to the Pipelines page and then select Ingest Processor pipeline.
- On the Get started page, select Blank pipeline and then Next.
- On the Define your pipeline's partition page, do the following:
- Select how you want to partition your incoming data that you want to send to your pipeline. You can partition by source type, source, and host.
- Enter the conditions for your partition, including the operator and the value. Your pipeline will receive and process the incoming data that meets these conditions.
- Select 'Next to confirm the pipeline partition.
- On the Add sample data page, do the following:
- Enter or upload sample data for generating previews that show how your pipeline processes data. The sample data must contain accurate examples of the values that you want to extract into fields.
For example, the following sample events represent purchases made at a store at a particular time:
E9FF471F36A91031FE5B6D6228674089, 72E0B04464AD6513F6A613AABB04E701, Credit Card, 7.7, 2023-01-13 04:41:00, 2023-01-13 04:45:00, -73.997292, 40.720982, 4532038713619608 A5D125F5550BE7822FC6EE156E37733A, 08DB3F9FCF01530D6F7E70EB88C3AE5B, Credit Card,14, 2023-01-13 04:37:00, 2023-01-13 04:47:00, -73.966843,40.756741, 4539385381557252 1E65B7E2D1297CF3B2CA87888C05FE43,F9ABCCCC4483152C248634ADE2435CF0, Game Card, 16.5, 2023-01-13 04:26:00, 2023-01-13 04:46:00, -73.956451, 40.771442
- Select Next to confirm the sample data that you want to use for your pipeline.
- On the Select destination dataset page, select the name of the destination that you want to send data to, then do the following:
- If you selected a Splunk platform S2S or Splunk platform HEC destination, select Next.
- If you selected another type of destination, select Done and skip the next step.
- (Optional) If you're sending data to a Splunk platform deployment, you can specify a target index:
- In the Index name field, select the name of the index that you want to send your data to.
- (Optional) In some cases, incoming data already specifies a target index. If you want your Index name selection to override previous target index settings, then select the Overwrite previously specified target index check box.
- Select Done.
If you're sending data to a Splunk platform deployment, be aware that the destination index is determined by a precedence order of configurations.
You now have a simple pipeline that receives data for a specific source type and sends that data to a destination. In the next section, you'll configure this pipeline to filter and mask your data.
Configure filtering and masking in your pipeline
During the previous step, you created a basic pipeline that receives data that is associated with a specific partition and then sends it to a destination. For example, if you added the custom "Buttercup_Games" source type, then the pipeline will filter the data that it receives to only the data with the "Buttercup_Games" source type.
The next step is to configure the pipeline to do the following:
- Extract data values into fields so that these values can be used in filtering criteria.
- Filter the incoming data.
- Mask confidential information in the data.
When configuring field extractions and masking, you use regular expressions to select the data values that you want to extract or mask. You must write these regular expressions using Regular Expression 2 (RE2) syntax. See Regular expression syntax for Ingest Processor pipelines for more information.
The examples in the instructions that follow are based on the sample events described in the Create a pipeline section.
- If the data values that you want to filter are not stored in event fields, then extract them into fields. The following steps describe how to extract the
Credit Card
andGame Card
values from the sample data into a field namedcard_type
. For more general field extraction instructions, see Extract fields from event data using Ingest Processor. - Select the Preview Pipeline icon () to generate a preview that shows what the sample data looks like when it passes through the pipeline.
- In the Actions section, select the plus icon (), then select Extract fields from _raw.
- In the Regular expression field, enter the following:
(?P<card_type>(Credit Card|Game Card))
- Select Apply to perform the field extraction and close the Extract fields from _raw dialog box.
- To filter your data, do the following:
- Select the plus icon () in the Actions section, then select Filter values.
- In the Add filter dialog box, define your filtering criteria. For example, to filter the sample data and retain only events that have
Credit Card
as thecard_type
value, use these settings:Option name Enter or select the following Field card_type
Action Keep Operator = equals Value Credit Card - Select Apply to filter your data and close the Add filter dialog box.
- The pipeline editor adds a
where
command to the SPL2 statement of your pipeline. Thiswhere
command performs the filtering and drops the excluded data. For more information, see where command overview in the SPL2 Search Reference. In the preview results panel, the data now includes only events that haveCredit Card
as thecard_type
value. To mask your data, do the following:- Select the plus icon () in the Actions section, then select Mask values in_raw.
- In the Mask using regular expression dialog box, define your masking criteria. For example, to mask the credit card numbers in the sample data by replacing them with the word "<redacted>", use these settings:
Option name Enter or select the following Field _raw Matching regular expression [1-5][0-9]{15} Replace with <redacted> Match case This option is not used when matching numbers, so you don't need to do anything with it. - Select Apply to mask the data and close the Mask using regular expression dialog box.
eval
command to the SPL2 statement of your pipeline. Thiseval
command uses the replace function to replace the data values matched by the regular expression with the word "<redacted>". For more information, see eval command overview and replace in the SPL2 Search Reference. In the preview results panel, the data in the_raw
column no longer displays credit card numbers. - (Optional) If you selected a Splunk platform destination for your pipeline, you can specify the index that you want Ingest Processor to send your data to. To choose a specific destination index for your data, do the following:
- In your Splunk platform deployment, verify that the index exists and that you have access to the index.
- In the SPL2 statement of the pipeline, add an
eval
command specifying the index name after thewhere
command but before theinto $destination
command. For example:$pipeline = | from $source | rex field=_raw /(?P<card_type>(Credit Card|Game Card))/ | where card_type = "Credit Card" | eval _raw=replace(_raw, /[1-5][0-9]{15}/i, "<redacted>") | eval index="<index name>" | into $destination;
The pipeline editor adds a rex
command to the SPL2 statement of your pipeline. This rex
command performs the field extraction. For more information, see rex command overview in the SPL2 Search Reference. In the preview results panel, the data now includes an additional column named card_type
.
See How does an Ingest Processor know which index to send data to? for more information.
You now have a pipeline that filters and masks data. In the next section, you'll verify that this pipeline processes data in the way that you expect and save it to be applied to a Ingest Processor.
Preview, save, and apply your pipeline
- (Optional) To see a preview of how your pipeline processes data based on the sample data that you provided while selecting your source type, select the Preview Pipeline icon ().
- To save your pipeline, do the following:
- Select Save pipeline.
- In the Name field, enter a name for your pipeline.
- (Optional) In the Description field, enter a description for your pipeline.
- Select Save.
- To apply this pipeline, do the following:
The pipeline is now listed on the Pipelines page, and you can apply it as needed.
It can take a few minutes for the Ingest Processor service to finish applying your pipeline. During this time, all applied pipelines enter the Pending status. Once the operation is complete, the Pending Apply status icon () stops displaying beside the pipeline. Refresh your browser to check if the icon no longer displays.
Your applied pipeline can now filter and mask the data that it receives so that only a desired subset of that data gets sent to the destination specified in the pipeline. Any data that does not meet your where command conditions will be dropped from your pipeline.
PREVIOUS Use templates to create pipelines for Ingest Processor |
NEXT Extract fields from event data using Ingest Processor |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.1.2308 (latest FedRAMP release), 9.1.2312
Feedback submitted, thanks!