Splunk® Data Stream Processor

Function Reference

Acrobat logo Download manual as PDF

Acrobat logo Download topic as PDF

Datagen

Datagen is a non-production function currently under development. Datagen does not appear in the Streaming ML user interface. Limit your use of Datagen to experimentation work only.

Datagen generates synthetic data such as timestamps, normally distributed variables, hash values, and ranged values. Users can combine Datagen primitives to craft a template that matches the expected format of the event.

Datagen uses a params argument to define configurations of fields defined in a template string. Once defined, events will be generated and Datagen adds in value as the field containing the entire string, along with individual fields in the template.

Function Input/Output Schema

Function Output
collection<record<R>>
This function outputs collections of records with schema R.

Syntax

The required fields are in bold.

| from datagen("{timestamp} {norm} {hash}", {}, 100);

Required arguments

format
Syntax: string
Description: Template string to generate events. Parameters are enclosed in {}. If configuration corresponding to parameter is provided in the "params" argument, Datagen replaces it with relevant value.
Example:| from datagen("Look at this number: {number}", {"number.type": "value", "number.value": "10"}, 1);

Optional arguments

fieldgen
Syntax: type
Description: A scalar function to generate any of the above generations on non-Datagen sources.
Example: fieldgen(type, params)
interval
Syntax: integer
Description: Defines how frequently (in milliseconds) events should be emitted.
Example: {"interval": 1000},
n
Syntax: long
Description: If defined, Datagen generates "n" events and terminates afterwards.
p
Syntax: integer
Description: If defined Datagen creates "p" parallel instances. Each parallel instance will generate n (defined) events. Parallelism set for the entire job will be preferred over this setting.
Example: {"p": 4},
seed
Syntax: integer
Description: If defined, this will be used as seed for all random operations. Set this for deterministic behavior for use in tests.
Example: 42
type
Syntax: map (string, any)
Description: Map holding replacement configuration for parameters defined in format. Key in the map is matched against the provided format and Datagen replaces number sub-string with relevant value.
Example: {"number.type": "value", "number.value": "10"}

Generators

It is not necessary to define field.type on every field you want to generate. The type of a field can be used as shorthand. The following two statements are equivalent:

| from datagen("{timestamp}")
| from datagen("{field}", {"field.type", "timestamp"});

You can also set additional configurations with the following shorthand notation. This shorthand is applicable to all Generator fields listed:
| from datagen("{range}", {"range.max": 1024});

eps
Syntax: long
Description: Experimental feature to generate timestamps per defined rate. If defined, timestamps are spread uniformly per second. This is a simulated "eps" that works at the normal rate of event generation. Note that "eps=2" doesn't mean only two events will be generated per second in real time, but means that timestamps outputted will be spread apart by 500ms.
Example: 2
hash
Syntax: integer
Description: Generates a random alphanumeric string of a chosen length.
Example: {"field.type": "hash", "field.length": 64});
ipv4
Syntax: string
Description: Replaces with an IPV4 value.
Example: {"field.type": "ipv4"});
ipv6
Syntax: string
Description: Replaces with an IPV6 value.
Example: {"field.type": "ipv6"});
integerid
Syntax: integer
Description: Assigns an incremental integer value.
Example: {"field.type": "integerid", "field.start": 1000});
list
Syntax: string
Description: Outputs a random value from a provided comma separated values list.
Example: {"field.type": "list", "field.values": "debug,info,warning"});
norm
Syntax: string
Description: Generates a random Gaussian variable.
Example: {"field.type": "norm"});
range
Syntax: integer
Description: Replaces with an integer or float value within a provided range.
Example: {"field.type": "range"});
seqlist
Syntax: collection<string>
Description: Picks one element from list sequentially and returns to 0th index once exhausted.
Example: {"field.type": "seqlist", "field.values": "debug,info,warning"});
timestamp
Syntax: string
Description: Defines how to format the time and follows "strptime"/ "strftime" semantics. To output a unix epoch with millisecond precision, use "%s".
Example: {"field.type": "timestamp", "field.format": "%b %d %H:%M:%S"});
value
Syntax: string
Description: Replaces variable with chosen value.
Example: {"field.type": "value", "value", "10"}, 100);

Usage

You can set up parallel instances using p parameter to increase overall throughput.

SPL2 example

The following example uses Datagen on a test set:

| from datagen("{timestamp} {norm} {hash}", {}, 100);
Last modified on 16 November, 2020
PREVIOUS
Break Events
  NEXT
Drift Detection

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0


Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters