Ingest Processor is currently released as a preview only and is not officially supported. See Splunk General Terms for more information. For any questions on this preview, please reach out to ingestprocessor@splunk.com.

Extract fields from event data using Ingest Processor

You can create a pipeline that extracts specific values from your data into fields. Field extraction lets you capture information from your data in a more visible way and configure further data processing based on those fields.

For example, when working with event data that corresponds to login attempts on an email server, you can extract the usernames from those events into a dedicated field named Username. You can then use this Username field to filter for login attempts that were made by a specific user or identify the user that made each login attempt without needing to scan through the entire event.

If you're sending data to Splunk Enterprise or Splunk Cloud Platform, be aware that some fields are extracted automatically during indexing. Additionally, be aware that indexing extracted fields can have an impact on indexing performance and search times. Consider the following best practices when configuring field extractions in your pipeline:

Extract fields only as necessary. When possible, extract fields at search time instead.
Avoid duplicating your data and increasing the size of your events. After extracting a value into a field, either remove the original value or drop the extracted field after you have finished using it to support a data processing action.

For more information, see When Splunk software extracts fields in the Splunk Cloud Platform Knowledge Manager Manual.

Pipelines don't extract any fields by default. If a pipeline receives data from a data source that doesn't extract data values into fields, such as a universal forwarder without any add-ons, then the pipeline stores each event as a text string in a field named _raw.

Steps

To create a field extraction pipeline, use the Extract fields from action in the pipeline editor to specify regular expressions that identify the field names and values you want to extract.

You must write these regular expressions using Regular Expression 2 (RE2) syntax. See Regular expression syntax for Ingest Processor pipelines for more information.

You can write the regular expressions manually or select from a library of prewritten regular expressions, and then preview the resulting field extractions before applying them.

Navigate to the Pipelines page and then select New pipeline, then Ingest Processor pipeline.
On the Get started page, select Blank pipeline and then Next.
On the Define your pipeline's partition page, do the following:

Select how you want to partition your incoming data that you want to send to your pipeline. You can partition by source type, source, and host.
Enter the conditions for your partition, including the operator and the value. Your pipeline will receive and process the incoming data that meets these conditions.
Select Next to confirm the pipeline partition.

On the Add sample data page, do the following:

Enter or upload sample data for generating previews that show how your pipeline processes data. The sample data must contain accurate examples of the values that you want to extract into fields. For example, the following sample events represent login attempts on an email server and contain examples of how usernames can appear in event data:

Wed Feb 14 2023 23:16:57 mailsv1 sshd[4590]: Failed password for apache from 78.111.167.117 port 3801 ssh2              
Wed Feb 14 2023 15:51:38 mailsv1 sshd[1991]: Failed password for grumpy from 76.169.7.252 port 1244 ssh2              
Mon Feb 12 2023 09:31:03 mailsv1 sshd[5800]: Failed password for invalid user guest from 66.69.195.226 port 2903 ssh2             
Sun Feb 11 2023 14:12:56 mailsv1 sshd[1565]: Failed password for invalid user noone from 187.231.45.62 port 1092 ssh2             
Sun Feb 11 2023 07:09:29 mailsv1 sshd[3560]: Failed password for games from 187.231.45.62 port 3752 ssh2              
Sat Feb 10 2023 03:25:43 mailsv1 sshd[2442]: Failed password for invalid user admin from 211.166.11.101 port 1797 ssh2              
Fri Feb 09 2023 21:45:20 mailsv1 sshd[1689]: Failed password for invalid user guest from 222.41.213.238 port 2658 ssh2              
Fri Feb 09 2023 06:27:34 mailsv1 sshd[2226]: Failed password for invalid user noone from 199.15.234.66 port 3366 ssh2             
Fri Feb 09 2023 18:32:51 mailsv1 sshd[5710]: Failed password for agarcia from 209.160.24.63 port 1775 ssh2              
Thu Feb 08 2023 08:42:11 mailsv1 sshd[3202]: Failed password for invalid user noone from 175.44.1.172 port 2394 ssh2

Select Next to confirm the sample data that you want to use for your pipeline.

On the Select destination dataset page, select the name of the destination that you want to send data to, then do the following:
1. If you selected a Splunk platform S2S or Splunk platform HEC destination, select Next.
2. If you selected another type of destination, select Done and skip the next step.
(Optional) If you're sending data to a Splunk platform deployment, you can specify a target index:

In the Index name field, select the name of the index that you want to send your data to.
(Optional) In some cases, incoming data already specifies a target index. If you want your Index name selection to override previous target index settings, then select the Overwrite previously specified target index check box.
Select Done.

If you're sending data to a Splunk platform deployment, be aware that the destination index is determined by a precedence order of configurations.

(Optional) To generate a preview of how your pipeline processes data based on the sample data that you provided, select the Preview Pipeline icon (). Use the preview results to validate your pipeline configuration.
Select the plus icon () in the Actions section and then select Extract fields from _raw.
In the Extract fields from _raw dialog box, do the following:

In the Regular expression field, specify one or more named capture groups using RE2 syntax. The name of the capture group determines the name of the extracted field, and the matched values determine the values of the extracted field. You can select named capture groups from the Insert from library list or enter named capture groups directly in the field. For example, to extract usernames from the sample events described previously, start by entering the phrase invalid user. Then, from the Insert from library list, select Username. The resulting regular expression looks like this: invalid user (?P<Username>[a-zA-Z0-9._-]+)
(Optional) By default, the regular expression matches are case sensitive. To make the matches case insensitive, uncheck the Match case check box.
Use the Events preview pane to validate your regular expression. The events in this pane are based on the last time that you generated a pipeline preview, and the pane highlights the values that match your regular expression for extraction.
When you are satisfied with the events highlighted in the Events preview pane, select Apply to perform the field extraction. A rex command is added to the SPL2 statement of your pipeline, and the new field appears in the Fields list on the Data tab. To include this field in your pipeline preview, select the check box next to the field name.

To save your pipeline, do the following:

Select Save pipeline.
In the Name field, enter a name for your pipeline.
(Optional) In the Description field, enter a description for your pipeline.
Select Save.

To apply this pipeline, do the following:

Navigate to the Pipelines page.
In the row that lists your pipeline, select the Actions icon () and then select Apply.
Select the pipeline that you want to apply, and then select Save.

It can take a few minutes for the Ingest Processor to finish applying your pipeline. During this time, all applied pipelines enter the Pending status. Once the operation is complete, the Pending Apply status icon () stops displaying beside the pipeline, and all affected pipelines transition from the Pending status back to the Healthy status. Refresh your browser to check if the Pending Apply status icon () no longer displays. You now have a pipeline that extracts specific values from your data into event fields.

Related answers from Splunk Community

Extract fields from event data using Ingest Processor

Steps

Comments

Extract fields from event data using Ingest Processor

Was this topic useful?