Splunk® Data Stream Processor

Function Reference

Acrobat logo Download manual as PDF

Acrobat logo Download topic as PDF


Datagen is a non-production function currently under development. Datagen does not appear in the Streaming ML user interface. Limit your use of Datagen to experimentation work only.

Datagen generates synthetic data such as timestamps, normally distributed variables, hash values, and ranged values. Users can combine Datagen primitives to craft a template that matches the expected format of the event.

Datagen uses a params argument to define configurations of fields defined in a template string. Once defined, events will be generated and Datagen adds in value as the field containing the entire string, along with individual fields in the template.

Function Input/Output Schema

Function Output
This function outputs collections of records with schema R.


The required fields are in bold.

| from datagen("{timestamp} {norm} {hash}", {}, 100);

Required arguments

Syntax: string
Description: Template string to generate events. Parameters are enclosed in {}. If configuration corresponding to parameter is provided in the "params" argument, Datagen replaces it with relevant value.
Example:| from datagen("Look at this number: {number}", {"number.type": "value", "number.value": "10"}, 1);

Optional arguments

Syntax: type
Description: A scalar function to generate any of the above generations on non-Datagen sources.
Example: fieldgen(type, params)
Syntax: integer
Description: Defines how frequently (in milliseconds) events should be emitted.
Example: {"interval": 1000},
Syntax: long
Description: If defined, Datagen generates "n" events and terminates afterwards.
Syntax: integer
Description: If defined Datagen creates "p" parallel instances. Each parallel instance will generate n (defined) events. Parallelism set for the entire job will be preferred over this setting.
Example: {"p": 4},
Syntax: integer
Description: If defined, this will be used as seed for all random operations. Set this for deterministic behavior for use in tests.
Example: 42
Syntax: map (string, any)
Description: Map holding replacement configuration for parameters defined in format. Key in the map is matched against the provided format and Datagen replaces number sub-string with relevant value.
Example: {"number.type": "value", "number.value": "10"}


It is not necessary to define field.type on every field you want to generate. The type of a field can be used as shorthand. The following two statements are equivalent:

| from datagen("{timestamp}")
| from datagen("{field}", {"field.type", "timestamp"});

You can also set additional configurations with the following shorthand notation. This shorthand is applicable to all Generator fields listed:
| from datagen("{range}", {"range.max": 1024});

Syntax: long
Description: Experimental feature to generate timestamps per defined rate. If defined, timestamps are spread uniformly per second. This is a simulated "eps" that works at the normal rate of event generation. Note that "eps=2" doesn't mean only two events will be generated per second in real time, but means that timestamps outputted will be spread apart by 500ms.
Example: 2
Syntax: integer
Description: Generates a random alphanumeric string of a chosen length.
Example: {"field.type": "hash", "field.length": 64});
Syntax: string
Description: Replaces with an IPV4 value.
Example: {"field.type": "ipv4"});
Syntax: string
Description: Replaces with an IPV6 value.
Example: {"field.type": "ipv6"});
Syntax: integer
Description: Assigns an incremental integer value.
Example: {"field.type": "integerid", "field.start": 1000});
Syntax: string
Description: Outputs a random value from a provided comma separated values list.
Example: {"field.type": "list", "field.values": "debug,info,warning"});
Syntax: string
Description: Generates a random Gaussian variable.
Example: {"field.type": "norm"});
Syntax: integer
Description: Replaces with an integer or float value within a provided range.
Example: {"field.type": "range"});
Syntax: collection<string>
Description: Picks one element from list sequentially and returns to 0th index once exhausted.
Example: {"field.type": "seqlist", "field.values": "debug,info,warning"});
Syntax: string
Description: Defines how to format the time and follows "strptime"/ "strftime" semantics. To output a unix epoch with millisecond precision, use "%s".
Example: {"field.type": "timestamp", "field.format": "%b %d %H:%M:%S"});
Syntax: string
Description: Replaces variable with chosen value.
Example: {"field.type": "value", "value", "10"}, 100);


You can set up parallel instances using p parameter to increase overall throughput.

SPL2 example

The following example uses Datagen on a test set:

| from datagen("{timestamp} {norm} {hash}", {}, 100);
Last modified on 16 November, 2020
Break Events
Drift Detection

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0

Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters