Getting sample data for previewing data transformations

You can generate previews to see how your pipeline or source type configurations can change the incoming data. These previews are based on the sample data that you specify in the pipeline or source type.

For example, when editing a pipeline, you can provide Windows event logs as sample data and then generate previews that show how the processing commands in your pipeline transform Windows event logs. Similarly, when editing a source type, you can provide Cisco syslog output as sample data and then generate previews that show how the event breaking and merging definitions in your source type preprocess cisco_syslog data into distinct events. To specify sample data when editing a source type, select Edit sample data.

If you don't specify sample data in your pipeline, but you configure the pipeline to receive data from a source type that has sample data, then the sample data from the source type is used when you preview the pipeline.

To generate accurate previews, you must provide sample data that has the same format as the actual data that you want to process. In preview mode, if a string field contains more than 50,000 characters, then that string field will automatically be truncated down to 1,000 characters.

Supported formats for sample data

You can generate pipeline previews using either raw data or parsed data that has values stored in event fields. Source type previews support raw data only. Parsed data must be specified in CSV format, with the header containing the names of the event fields. If your sample events include the _time field, the values in that field must be ISO 8601 timestamps.

Splunk software uses the _time field for its internal processes.

For example, if you want to preview a pipeline that's designed to process HTTP Event Collector (HEC) events that contain fields named severity and category, then you need to provide parsed data that has values stored in the severity and category fields.

_raw,_time,severity,category
Hello World,2023-04-24T13:00:05.105+0000,INFO,system
Unexpected failure,2023-04-24T13:25:48.128+0000,ERROR,system
Shutting down,2023-04-24T13:30:57.306+0000,INFO,system

As another example, if you want to process raw data from a universal forwarder where the start and end of each event is delimited by a line break, then you must provide plain text strings that are on separate lines.

Wed Feb 14 2023 23:16:57 mailsv1 sshd[4590]: Failed password for apache from 78.111.167.117 port 3801 ssh2
Wed Feb 14 2023 15:51:38 mailsv1 sshd[1991]: Failed password for grumpy from 76.169.7.252 port 1244 ssh2
Mon Feb 12 2023 09:31:03 mailsv1 sshd[5800]: Failed password for invalid user guest from 66.69.195.226 port 2903 ssh2

Methods for getting sample data

The following are a few methods that you can use to get sample data for generating previews:

Copy the sample data that is included with default source types. If the owner of the source type is system, then it is a default source type.
Copy the sample data that is included with pipeline templates.
Use Splunk Cloud Platform or Splunk Enterprise to search for relevant data, and then export the search results to a CSV file. For more information, see Export data using Splunk Web in the Splunk Cloud Platform Search Manual.
Use the Search Experience to find relevant data from the Splunk Cloud Platform deployment that's connected to your tenant, and then copy values from the _raw field to use as sample data. See Copy results from Search Experience for more information.
Create a snapshot of your incoming data based on your pipeline's partition and a specified time interval and number of events. See Capture a snapshot of incoming data for more information.

Copy results from Search Experience

To copy results from the Search Experience in a format that you can immediately use as sample data for previews, do the following:

Navigate to the Search page.
Select a dataset that you want to search, and then select Apply.
In the SPL Editor, enter a search statement written in SPL2. For more information about writing SPL2 search statements, see the Splunk Cloud Services SPL2 Search Manual.
Select the Run icon () to run your search. The events returned by your search statement appear in the search results panel.
In the search results panel, hover over the header for the _raw field to make the Options for "_raw" icon () appear. Select that icon to open the Options menu and then select Copy field values.

Capture a snapshot of incoming data

To create a snapshot of your incoming data to use as sample data for previews, do the following in the pipeline builder:

Navigate to the Pipelines page and then select Ingest Processor pipeline.
On the Get started page, select Blank pipeline and then Next.
On the Define your pipeline's partition page, do the following:

Select how you want to partition your incoming data that you want to send to your pipeline. You can partition by source type, source, and host.
Enter the conditions for your partition, including the operator and the value. Your pipeline receives and processes the incoming data that meets these conditions.
Select Next to confirm the pipeline partition.

On the Add sample data page, select Capture new snapshot. Then, complete the following fields:

Enter a name for your snapshot of sample data.
Specify the maximum time interval for your snapshot. This determines how long, in minutes, the data flow runs during the snapshot.
Specify the maximum number of events that you want your snapshot to capture.
Ensure that your partition conditions reflect the data you want to sample for your pipeline.

Select Capture.

Related answers from Splunk Community

Getting sample data for previewing data transformations

Supported formats for sample data

Methods for getting sample data

Copy results from Search Experience

Capture a snapshot of incoming data

Comments

Getting sample data for previewing data transformations

Was this topic useful?