Splunk Cloud Platform

Use Ingest Processors

Process a copy of data using Ingest Processor

Use the thru command when you want to process or route the same set of data in 2 distinct ways.

The thru command does the following:

  1. Creates an additional path in the pipeline.
  2. Copies all of the incoming data in the pipeline.
  3. Sends the copied data into the newly created path.

The following diagram shows how the thru command sends data into different paths in the pipeline.

Arrows that represent 3 different types of data enter a pipeline and reach the thru command. The thru command copies the data, producing 2 sets of 3 data types. Each set continues downstream along a different pipeline path and gets processed differently.

Add the thru command to a pipeline

  1. In the SPL2 editor, in the part of the pipeline where you want to create an additional pipeline path, enter the following:
    | thru [ <processing_commands>
        | into <destination>
    ]
    

    Where:

    • <processing_commands> is one or more SPL2 commands for processing the copied data in the newly created pipeline path. Each command must be delimited by a pipe ( | ). If you don't want to make any changes to the copied data, you can leave this part empty.
    • <destination> is an SPL2 variable indicating the destination that you want to send the copied data to. SPL2 variables must begin with a dollar sign ( $ ) and can be composed of uppercase or lowercase letters, numbers, and the underscore ( _ ) character.

    For example:

    $pipeline = | from $source
    | thru [
        | into $data_copy_destination
    ]
    | into $destination;
    
  2. In the Actions section of the pipeline builder, select Send data to <destination>, where <destination> is the SPL2 variable that you specified during the previous step.
  3. Select the destination that you want to send the data to, and then select Apply.
  4. (Optional) Select the Preview Pipeline icon (Image of the Preview Pipeline icon) to generate a preview that shows what your data looks like when it passes through the pipeline. In the preview results panel, use the drop-down list to select the pipeline path that you want to preview.

Example: Create backup copies of logs before processing them

In this example, the Ingest Processor is receiving Cisco syslog data such as the following:

<13>Jan 09 01:54:40 10.10.232.91 : %ASA-3-505016: Module Sample_prod_id3 in slot 1 application changed from: sample_app2 version 1.1.0 state Sample_state1 to: Sample_server_name2 1.1.0 state Sample_state1.
<19>Jan 09 01:54:40 10.10.136.129 : %ASA-1-505015: Module Sample_prod_id2 in slot 1 , application up sample_app2 , version 1.1.0
<91>Jan 09 01:54:40 10.10.144.67 : %FTD-5-505012: Module Sample_prod_id1 in slot 1 , application stopped sample_app1 , version 2.2.1
<101>Jan 09 01:54:40 10.10.219.51 : %FTD-1-505011: Module 10.10.94.98 , data channel communication is UP.

Assume that you want to do the following:

  1. Make an unaltered backup copy of this data and send it to an Amazon S3 bucket.
  2. In the other copy of the data, obfuscate the IP addresses so that they aren't directly human-readable.
  3. Send the obfuscated logs to a Splunk index named cisco_syslog.

You can achieve this by creating a pipeline that uses the thru command to immediately send a copy of the received data to an Amazon S3 destination, and then obfuscates the IP addresses and assigns an appropriate index value to the logs. The following diagram shows the commands that this pipeline would contain and how the data would get processed as it moves through the pipeline:

The "from $source" command receives the logs. Then, the "thru" command makes a copy of the logs. An "into $destination2" command sends the copied logs to Amazon S3. The original logs continue through the pipeline, where a "rex" command and 2 "eval" commands censor the IP addresses in the logs. Finally, an "eval" command sets the target index, and then an "into $destination" command sends the censored logs to the Splunk platform.

To create this pipeline, do the following:

  1. On the Pipelines page, select New pipeline. Follow the on-screen instructions to define a partition, optionally enter sample data, and select data destinations for metrics and non-metrics data. Set the non-metrics data destination to the Splunk platform deployment that you want to send obfuscated logs to.
    After you complete the on-screen instructions, the pipeline builder displays the SPL2 statement for your pipeline.
  2. Add a thru command to send an unaltered copy of the received data to an Amazon S3 bucket.
    • In the SPL2 editor, in the space following the | from $source command, enter the following:
      | thru [ 
          | into $destination2
      ]
      
    • In the Actions section of the pipeline builder, select Send data to $destination2. Select the Amazon S3 destination that you want to send these unprocessed logs to, and then select Apply.
  3. (Optional) Select the Preview Pipeline icon (Image of the Preview Pipeline icon) to generate a preview that shows what your data looks like when it passes through the pipeline. In the preview results panel, confirm that you are able to choose between $destination and $destination2 in the drop-down list, and that the same data is displayed in both cases.
  4. Extract the IP addresses from the logs into a field named ip_address.
    1. In the Actions section, select the plus icon (This image shows an icon of a plus sign.) and then select Extract fields from _raw.
    2. Select Insert from library, and then select IPv4WithOptionalPort.
    3. In the Regular expression field, change the name of the capture group from IPv4WithOptionalPort to ip_address. The updated regular expression looks like this:
      (?P<ip_address>(((?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)(?:\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)){3}))(?::(?:[1-9][0-9]*))?)
      
    4. Select Apply.

    The pipeline builder adds the following rex command to your pipeline:

    | rex field=_raw /(?P<ip_address>(((?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)(?:\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)){3}))(?::(?:[1-9][0-9]*))?)/
    
  5. Obfuscate the IP addresses.
    1. In the Actions section, select the plus icon (This image shows an icon of a plus sign.) and then select Compute hash of.
    2. In the Compute hash of a field dialog box, configure the following options and then select Apply.
      Option name Enter or select the following
      Source field ip_address
      Hashing algorithm SHA-256
      Target field ip_address

      The pipeline builder adds the following eval command to your pipeline:

      | eval ip_address = sha256(ip_address)
      

      The values in the ip_address field are now obfuscated, but the original IP addresses are still visible in the _raw field.

  6. Mask the IP addresses in the _raw field.
    1. In the Actions section, select the plus icon (This image shows an icon of a plus sign.) and then select Mask values in _raw.
    2. In the Mask using regular expression dialog box, configure the following options and then select Apply.
      Option name Enter or select the following
      Field _raw
      Matching regular expression
      (((?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)(?:\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)){3}))(?::(?:[1-9][0-9]*))?
      

      You can enter this expression by selecting the Regular Expression Library icon (Image of the Regular Expression Library icon) and then selecting IPv4WithOptionalPort.

      Replace with x.x.x.x
      Match case This option is not used when matching numbers, so you don't need to do anything with it.

      The pipeline builder adds the following eval command to your pipeline:

      | eval _raw=replace(_raw, /(((?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)(?:\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)){3}))(?::(?:[1-9][0-9]*))?/, "x.x.x.x")
      

      The IP addresses in the _raw field are replaced by x.x.x.x.

  7. Send these processed logs to an index named cisco_syslog.
    1. In the Actions section, select the plus icon (This image shows an icon of a plus sign.) and then select Target index.
    2. Select Specify index for all events.
    3. In the Index name field, enter cisco_syslog.
    4. Select Apply.

    The pipeline builder adds the following eval command to your pipeline:

    | eval index="cisco_syslog"
    

You now have a pipeline that sends an unaltered copy of the data to an Amazon S3 bucket, and then sends a processed copy of the data to an index. The complete SPL2 statement of the pipeline looks like this:

$pipeline = | from $source | thru [ 
    | into $destination2
]
| rex field=_raw /(?P<ip_address>(((?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)(?:\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)){3}))(?::(?:[1-9][0-9]*))?)/
| eval ip_address = sha256(ip_address)
| eval _raw=replace(_raw, /(((?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)(?:\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)){3}))(?::(?:[1-9][0-9]*))?/, "x.x.x.x")
| eval index = "cisco_syslog"
| into $destination;

See also

For information about other ways to route data, see Routing data in the same Ingest Processor pipeline to different actions and destinations.

Last modified on 06 August, 2024
Process a subset of data using Ingest Processor   Process multiple copies of data using Ingest Processor

This documentation applies to the following versions of Splunk Cloud Platform: 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release), 9.3.2408


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters