Splunk Cloud Platform

Use Ingest Processors

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Ingest Processor is currently released as a preview only and is not officially supported. See Splunk General Terms for more information. For any questions on this preview, please reach out to ingestprocessor@splunk.com.

Filter and mask data using Ingest Processor

You can create a pipeline that filters and masks the incoming data so that only a desired subset of that data gets sent to a destination.

Creating a filtering and masking pipeline involves doing the following:

  • Specifying the partition of the incoming data that the pipeline receives from and specifying a destination that the pipeline sends data to.
  • Defining the filtering criteria by including a where command in the SPL2 statement of the pipeline. See where command overview in the SPL2 Search Reference for more information.
  • Specifying a regular expression that selects and masks data by including a rex command in the SPL2 statement of the pipeline. See the following pages for more information:

As a best practice for preventing unwanted data loss, make sure to always have a default destination for your Ingest Processor. Otherwise, all unprocessed data is dropped. See Partitions for more information on what qualifies as unprocessed data.

Prerequisites

Before starting to create a pipeline, confirm the following:

  • If you want to partition your data by source type, the source type of the data that you want the pipeline to process is listed on the Source types page of your tenant. If your source type is not listed, then you must add that source type to your tenant and configure event breaking and merging definitions for it.
  • The destination that you want the pipeline to send data to is listed on the Destinations page of your tenant. If your destination is not listed, then you must add that destination to your tenant.

Steps

Perform the following steps to create a pipeline that filters and masks data:

  1. Create a pipeline.
  2. Configure filtering and masking in your pipeline.
  3. Preview, save, and apply your pipeline.

Create a pipeline

Complete these steps to create a pipeline that receives data associated with a specific source type, optionally processes it, and sends that data to a destination.

  1. Navigate to the Pipelines page and then select Ingest Processor pipeline.
  2. On the Get started page, select Blank pipeline and then Next.
  3. On the Define your pipeline's partition page, do the following:
    1. Select how you want to partition your incoming data that you want to send to your pipeline. You can partition by source type, source, and host.
    2. Enter the conditions for your partition, including the operator and the value. Your pipeline will receive and process the incoming data that meets these conditions.
    3. Select 'Next to confirm the pipeline partition.
  4. On the Add sample data page, do the following:
    1. Enter or upload sample data for generating previews that show how your pipeline processes data. The sample data must contain accurate examples of the values that you want to extract into fields. For example, the following sample events represent purchases made at a store at a particular time:
      E9FF471F36A91031FE5B6D6228674089, 72E0B04464AD6513F6A613AABB04E701, Credit Card, 7.7, 2023-01-13 04:41:00, 2023-01-13 04:45:00, -73.997292, 40.720982, 4532038713619608
      A5D125F5550BE7822FC6EE156E37733A, 08DB3F9FCF01530D6F7E70EB88C3AE5B, Credit Card,14, 2023-01-13 04:37:00, 2023-01-13 04:47:00, -73.966843,40.756741, 4539385381557252
      1E65B7E2D1297CF3B2CA87888C05FE43,F9ABCCCC4483152C248634ADE2435CF0, Game Card, 16.5, 2023-01-13 04:26:00, 2023-01-13 04:46:00, -73.956451, 40.771442
    2. Select Next to confirm the sample data that you want to use for your pipeline.
  5. On the Select destination dataset page, select the name of the destination that you want to send data to, then do the following:
    1. If you selected a Splunk platform S2S or Splunk platform HEC destination, select Next.
    2. If you selected another type of destination, select Done and skip the next step.
  6. (Optional) If you're sending data to a Splunk platform deployment, you can specify a target index:
    1. In the Index name field, select the name of the index that you want to send your data to.
    2. (Optional) In some cases, incoming data already specifies a target index. If you want your Index name selection to override previous target index settings, then select the Overwrite previously specified target index check box.
    3. Select Done.
    4. If you're sending data to a Splunk platform deployment, be aware that the destination index is determined by a precedence order of configurations.

You now have a simple pipeline that receives data for a specific source type and sends that data to a destination. In the next section, you'll configure this pipeline to filter and mask your data.

Configure filtering and masking in your pipeline

During the previous step, you created a basic pipeline that receives data that is associated with a specific partition and then sends it to a destination. For example, if you added the custom "Buttercup_Games" source type, then the pipeline will filter the data that it receives to only the data with the "Buttercup_Games" source type.

The next step is to configure the pipeline to do the following:

  • Extract data values into fields so that these values can be used in filtering criteria.
  • Filter the incoming data.
  • Mask confidential information in the data.

When configuring field extractions and masking, you use regular expressions to select the data values that you want to extract or mask. You must write these regular expressions using Regular Expression 2 (RE2) syntax. See Regular expression syntax for Ingest Processor pipelines for more information.

The examples in the instructions that follow are based on the sample events described in the Create a pipeline section.

  1. If the data values that you want to filter are not stored in event fields, then extract them into fields. The following steps describe how to extract the Credit Card and Game Card values from the sample data into a field named card_type. For more general field extraction instructions, see Extract fields from event data using Ingest Processor.
    1. Select the Preview Pipeline icon (Image of the Preview Pipeline icon) to generate a preview that shows what the sample data looks like when it passes through the pipeline.
    2. In the Actions section, select the plus icon (This image shows an icon of a plus sign.), then select Extract fields from _raw.
    3. In the Regular expression field, enter the following:
      (?P<card_type>(Credit Card|Game Card))
      
    4. Select Apply to perform the field extraction and close the Extract fields from _raw dialog box.

    The pipeline editor adds a rex command to the SPL2 statement of your pipeline. This rex command performs the field extraction. For more information, see rex command overview in the SPL2 Search Reference. In the preview results panel, the data now includes an additional column named card_type.

  2. To filter your data, do the following:
    1. Select the plus icon (This image shows an icon of a plus sign.) in the Actions section, then select Filter values.
    2. In the Add filter dialog box, define your filtering criteria. For example, to filter the sample data and retain only events that have Credit Card as the card_type value, use these settings:
      Option name Enter or select the following
      Field card_type
      Action Keep
      Operator = equals
      Value Credit Card
    3. Select Apply to filter your data and close the Add filter dialog box.
  3. The pipeline editor adds a where command to the SPL2 statement of your pipeline. This where command performs the filtering and drops the excluded data. For more information, see where command overview in the SPL2 Search Reference. In the preview results panel, the data now includes only events that have Credit Card as the card_type value. To mask your data, do the following:
    1. Select the plus icon (This image shows an icon of a plus sign.) in the Actions section, then select Mask values in_raw.
    2. In the Mask using regular expression dialog box, define your masking criteria. For example, to mask the credit card numbers in the sample data by replacing them with the word "<redacted>", use these settings:
      Option name Enter or select the following
      Field _raw
      Matching regular expression [1-5][0-9]{15}
      Replace with <redacted>
      Match case This option is not used when matching numbers, so you don't need to do anything with it.
    3. Select Apply to mask the data and close the Mask using regular expression dialog box.
    The pipeline editor adds an eval command to the SPL2 statement of your pipeline. This eval command uses the replace function to replace the data values matched by the regular expression with the word "<redacted>". For more information, see eval command overview and replace in the SPL2 Search Reference. In the preview results panel, the data in the _raw column no longer displays credit card numbers.
  4. (Optional) If you selected a Splunk platform destination for your pipeline, you can specify the index that you want Ingest Processor to send your data to. To choose a specific destination index for your data, do the following:
    1. In your Splunk platform deployment, verify that the index exists and that you have access to the index.
    2. In the SPL2 statement of the pipeline, add an eval command specifying the index name after the where command but before the into $destination command. For example:
      $pipeline = | from $source | rex field=_raw /(?P<card_type>(Credit Card|Game Card))/
      | where card_type = "Credit Card"
      | eval _raw=replace(_raw, /[1-5][0-9]{15}/i, "<redacted>")
      | eval index="<index name>"
      | into $destination;
      
    3. See How does an Ingest Processor know which index to send data to? for more information.

You now have a pipeline that filters and masks data. In the next section, you'll verify that this pipeline processes data in the way that you expect and save it to be applied to a Ingest Processor.

Preview, save, and apply your pipeline

  1. (Optional) To see a preview of how your pipeline processes data based on the sample data that you provided while selecting your source type, select the Preview Pipeline icon (Image of the Preview Pipeline icon).
  2. To save your pipeline, do the following:
    1. Select Save pipeline.
    2. In the Name field, enter a name for your pipeline.
    3. (Optional) In the Description field, enter a description for your pipeline.
    4. Select Save.

    The pipeline is now listed on the Pipelines page, and you can apply it as needed.

  3. To apply this pipeline, do the following:
    1. Navigate to the Pipelines page.
    2. In the row that lists your pipeline, select the Actions icon (Image of the Actions icon) and then select Apply.
    3. Select the pipeline that you want to apply, and then select Save.

It can take a few minutes for the Ingest Processor service to finish applying your pipeline. During this time, all applied pipelines enter the Pending status. Once the operation is complete, the Pending Apply status icon (Image of pending status icon) stops displaying beside the pipeline. Refresh your browser to check if the icon no longer displays.

Your applied pipeline can now filter and mask the data that it receives so that only a desired subset of that data gets sent to the destination specified in the pipeline. Any data that does not meet your where command conditions will be dropped from your pipeline.

Last modified on 14 March, 2024
PREVIOUS
Use templates to create pipelines for Ingest Processor
  NEXT
Extract fields from event data using Ingest Processor

This documentation applies to the following versions of Splunk Cloud Platform: 9.1.2308 (latest FedRAMP release), 9.1.2312


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters