Extract fields from event data using Ingest Processor
You can create a pipeline that extracts specific values from your data into fields. Field extraction lets you capture information from your data in a more visible way and configure further data processing based on those fields.
For example, when working with event data that corresponds to login attempts on an email server, you can extract the usernames from those events into a dedicated field named Username
. You can then use this Username
field to filter for login attempts that were made by a specific user or identify the user that made each login attempt without needing to scan through the entire event.
If you're sending data to Splunk Enterprise or Splunk Cloud Platform, be aware that some fields are extracted automatically during indexing. Additionally, be aware that indexing extracted fields can have an impact on indexing performance and search times. Consider the following best practices when configuring field extractions in your pipeline:
- Extract fields only as necessary. When possible, extract fields at search time instead.
- Avoid duplicating your data and increasing the size of your events. After extracting a value into a field, either remove the original value or drop the extracted field after you have finished using it to support a data processing action.
For more information, see When Splunk software extracts fields in the Splunk Cloud Platform Knowledge Manager Manual.
Pipelines don't extract any fields by default. If a pipeline receives data from a data source that doesn't extract data values into fields, such as a universal forwarder without any add-ons, then the pipeline stores each event as a text string in a field named _raw
.
Reference
To help you get started on creating and using pipelines, the Ingest Processor solution includes sample pipelines called templates. Templates are Splunk-built pipelines that are designed to work with specific data sources and use cases, such as extracting fields from events. Templates include sample data and preconfigured SPL2 statements, so you can use them as a starting point to build custom pipelines to solve specific use cases or as a reference to learn how to write SPL2 to build pipelines. To view a list of the available pipeline templates, log in to your tenant, navigate to the Pipelines page, and then select Templates.
See Use templates to create pipelines for Ingest Processor for instructions on how to build a pipeline from a template.
Steps
To create a field extraction pipeline, use the Extract fields from action in the pipeline editor to specify regular expressions that identify the field names and values you want to extract.
You must write these regular expressions using Regular Expression 2 (RE2) syntax. See Regular expression syntax for Ingest Processor pipelines for more information.
You can write the regular expressions manually or select from a library of prewritten regular expressions, and then preview the resulting field extractions before applying them.
- Navigate to the Pipelines page and then select New pipeline, then Ingest Processor pipeline.
- On the Get started page, select Blank pipeline and then Next.
- On the Define your pipeline's partition page, do the following:
- Select how you want to partition your incoming data that you want to send to your pipeline. You can partition by source type, source, and host.
- Enter the conditions for your partition, including the operator and the value. Your pipeline will receive and process the incoming data that meets these conditions.
- Select Next to confirm the pipeline partition.
- On the Add sample data page, do the following:
- Enter or upload sample data for generating previews that show how your pipeline processes data. The sample data must contain accurate examples of the values that you want to extract into fields.
For example, the following sample events represent login attempts on an email server and contain examples of how usernames can appear in event data:
Wed Feb 14 2023 23:16:57 mailsv1 sshd[4590]: Failed password for apache from 78.111.167.117 port 3801 ssh2 Wed Feb 14 2023 15:51:38 mailsv1 sshd[1991]: Failed password for grumpy from 76.169.7.252 port 1244 ssh2 Mon Feb 12 2023 09:31:03 mailsv1 sshd[5800]: Failed password for invalid user guest from 66.69.195.226 port 2903 ssh2 Sun Feb 11 2023 14:12:56 mailsv1 sshd[1565]: Failed password for invalid user noone from 187.231.45.62 port 1092 ssh2 Sun Feb 11 2023 07:09:29 mailsv1 sshd[3560]: Failed password for games from 187.231.45.62 port 3752 ssh2 Sat Feb 10 2023 03:25:43 mailsv1 sshd[2442]: Failed password for invalid user admin from 211.166.11.101 port 1797 ssh2 Fri Feb 09 2023 21:45:20 mailsv1 sshd[1689]: Failed password for invalid user guest from 222.41.213.238 port 2658 ssh2 Fri Feb 09 2023 06:27:34 mailsv1 sshd[2226]: Failed password for invalid user noone from 199.15.234.66 port 3366 ssh2 Fri Feb 09 2023 18:32:51 mailsv1 sshd[5710]: Failed password for agarcia from 209.160.24.63 port 1775 ssh2 Thu Feb 08 2023 08:42:11 mailsv1 sshd[3202]: Failed password for invalid user noone from 175.44.1.172 port 2394 ssh2
- Select Next to confirm the sample data that you want to use for your pipeline.
- On the Select a metrics destination page, select the name of the destination that you want to send metrics to.
- (Optional) If you selected Splunk Metrics store as your metrics destination, specify the name of the target metrics index where you want to send your metrics.
- On the Select a data destination page, select the name of the destination that you want to send logs to.
- (Optional) If you selected a Splunk platform destination, you can configure index routing:
- Select one of the following options in the expanded destinations panel:
Option Description Default The pipeline does not route events to a specific index.
If the event metadata already specifies an index, then the event is sent to that index. Otherwise, the event is sent to the default index of the Splunk Cloud Platform deployment.Specify index for events with no index The pipeline only routes events to your specified index if the event metadata did not already specify an index. Specify index for all events The pipeline routes all events to your specified index. - If you selected Specify index for events with no index or Specify index for all events, then from the Index name drop-down list, select the name of the index that you want to send your data to.
If your desired index is not available in the drop-down list, then confirm that the index is configured to be available to the tenant and then refresh the connection between the tenant and the Splunk Cloud Platform deployment. For detailed instructions, see Make more indexes available to the tenant.
- Select one of the following options in the expanded destinations panel:
- Select Done to confirm the data destination.
After you complete the on-screen instructions, the pipeline builder displays the SPL2 statement for your pipeline. - (Optional) To generate a preview of how your pipeline processes data based on the sample data that you provided, select the Preview Pipeline icon (). Use the preview results to validate your pipeline configuration.
- Select the plus icon () in the Actions section and then select Extract fields from _raw.
- In the Extract fields from _raw dialog box, do the following:
- In the Regular expression field, specify one or more named capture groups using RE2 syntax. The name of the capture group determines the name of the extracted field, and the matched values determine the values of the extracted field. You can select named capture groups from the Insert from library list or enter named capture groups directly in the field.
For example, to extract usernames from the sample events described previously, start by entering the phrase invalid user. Then, from the Insert from library list, select Username. The resulting regular expression looks like this:
invalid user (?P<Username>[a-zA-Z0-9._-]+)
- (Optional) By default, the regular expression matches are case sensitive. To make the matches case insensitive, uncheck the Match case check box.
- Use the Events preview pane to validate your regular expression. The events in this pane are based on the last time that you generated a pipeline preview, and the pane highlights the values that match your regular expression for extraction.
- When you are satisfied with the events highlighted in the Events preview pane, select Apply to perform the field extraction. A rex command is added to the SPL2 statement of your pipeline, and the new field appears in the Fields list on the Data tab. To include this field in your pipeline preview, select the check box next to the field name.
- To save your pipeline, do the following:
- Select Save pipeline.
- In the Name field, enter a name for your pipeline.
- (Optional) In the Description field, enter a description for your pipeline.
- Select Save.
- To apply this pipeline, do the following:
If you're sending data to a Splunk Cloud Platform deployment, be aware that the destination index is determined by a precedence order of configurations. See How does Ingest Processor know which index to send data to? for more information
It can take a few minutes for the Ingest Processor to finish applying your pipeline. During this time, all applied pipelines enter the Pending status. Once the operation is complete, the Pending Apply status icon () stops displaying beside the pipeline, and all affected pipelines transition from the Pending status back to the Healthy status. Refresh your browser to check if the Pending Apply status icon () no longer displays. You now have a pipeline that extracts specific values from your data into event fields.
Hash fields using Ingest Processor | Extract timestamps from event data using Ingest Processor |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release), 9.3.2408
Feedback submitted, thanks!