Splunk® Data Stream Processor

Function Reference

Acrobat logo Download manual as PDF

On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details. For information about upgrading to a supported version, see the Upgrade the Splunk Data Stream Processor topic.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

Datagen (beta)

Datagen is included with the ML plugin but under active development. Datagen is not visible in the Streaming ML user interface. Limit your use of Datagen to experimentation work.

Datagen is a Flink-native data source that generates various types of events. Use Datagen to generate synthetic data points such as timestamps, normally distributed variables, hash values, and ranged values. You can choose to instruct Datagen to generate a set number of synthetic data points before stopping.

You can define a template string that indicates which fields, and types of those fields, that need to be generated. Once defined, events are generated and Datagen adds in value as the field containing the entire string, along with individual fields in the template string. Datagen offers a optional params argument to define configurations of fields defined in the template string.

Function Input/Output Schema

Function Output
This function outputs collections of records with schema R.


from datagen
("{timestamp} {norm} {hash}",

Required arguments

Syntax: string
Description: Required template string to generate synthetic events. Parameters are enclosed in {}. Datagen replaces parameter configuration provided in the params argument with relevant value(s).
Example:| from datagen("Look at this number: {number}", {"number.type": "value", "number.value": "10"}, 1);

Optional arguments

Syntax: type
Description: A scalar function to generate any of the above generations on non-Datagen sources.
Example: fieldgen(type, params)
Syntax: integer
Description: Defines how frequently (in milliseconds) events should be emitted.
Example: {"interval": 1000},
Syntax: long
Description: If defined, Datagen generates "n" events and terminates afterwards.
Syntax: integer
Description: If defined Datagen creates "p" parallel instances. Each parallel instance will generate n (defined) events. Parallelism set for the entire job will be preferred over this setting.
Example: {"p": 4},
Syntax: map (string, any)
Description: Map for replacement configuration of parameters defined in format. Datagen replaces parameter configuration provided in the params argument with relevant value(s).
Syntax: integer
Description: If defined, this will be used as seed for all random operations. Set this for deterministic behavior for use in tests.
Example: 42


It is not necessary to define field.type on every field you want to generate. The type of a field can be used as shorthand. The following two statements are equivalent:

| from datagen("{timestamp}")
| from datagen("{field}", {"field.type", "timestamp"});

You can also set additional configurations with the following shorthand notation. This shorthand is applicable to all Generator fields listed:
| from datagen("{range}", {"range.max": 1024});

Syntax: long
Description: Experimental feature to generate timestamps per defined rate. If defined, timestamps are spread uniformly per second. This is a simulated "eps" that works at the normal rate of event generation. Note that "eps=2" doesn't mean only two events will be generated per second in real time, but means that timestamps outputted will be spread apart by 500ms.
Example: 2
Syntax: integer
Description: Generates a random alphanumeric string of a chosen length.
Example: {"field.type": "hash", "field.length": 64});
Syntax: string
Description: Replaces with an IPV4 value.
Example: {"field.type": "ipv4"});
Syntax: string
Description: Replaces with an IPV6 value.
Example: {"field.type": "ipv6"});
Syntax: integer
Description: Assigns an incremental integer value.
Example: {"field.type": "integerid", "field.start": 1000});
Syntax: string
Description: Outputs a random value from a provided comma separated values list.
Example: {"field.type": "list", "field.values": "debug,info,warning"});
Syntax: string
Description: Generates a random Gaussian variable.
Example: {"field.type": "norm"});
Syntax: integer
Description: Replaces with an integer or float value within a provided range.
Example: {"field.type": "range"});
Syntax: collection<string>
Description: Picks one element from list sequentially and returns to 0th index once exhausted.
Example: {"field.type": "seqlist", "field.values": "debug,info,warning"});
Syntax: string
Description: Defines how to format the time and follows "strptime"/ "strftime" semantics. To output a unix epoch with millisecond precision, use "%s".
Example: {"field.type": "timestamp", "field.format": "%b %d %H:%M:%S"});
Syntax: string
Description: Replaces variable with chosen value.
Example: {"field.type": "value", "value", "10"}, 100);


You can set up parallel instances using p parameter to increase overall throughput.

SPL2 example

The following example uses Datagen on a test set. The number 100 in the example tells Datagen how many events to create before stopping.

| from datagen("{timestamp} {norm} {hash}", {}, 100);
Last modified on 02 September, 2021
Break Events
Drift Detection (beta)

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5

Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters