Splunk® Data Stream Processor

Function Reference

Acrobat logo Download manual as PDF


DSP 1.2.0 is impacted by the CVE-2021-44228 and CVE-2021-45046 security vulnerabilities from Apache Log4j. To fix these vulnerabilities, you must upgrade to DSP 1.2.4. See Upgrade the Splunk Data Stream Processor to 1.2.4 for upgrade instructions.

On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

Sentiment Analysis (beta)

Sentiment Analysis generates a label on unstructured text input using natural language processing (NLP). The Sentiment Analysis function classifies raw text as positive, negative, or neutral. Raw text input can include data streams such as messages, customer reviews, IT tickets, or computer logs. Sentiment Analysis is useful for downstream processes such as flagging negative customer reviews for contact by a customer service representative in a customer service application.

Sentiment Analysis predicts the sentiment of text samples in real time on the stream. You must provide labeled data (label = 1 or -1) to train a model. The model will predict the sentiment for each new example it encounters. In addition, if the data is labeled, the model will use the label provided to incrementally improve over time. If the data is unlabeled (label = 0), the model will not be updated.

The model predicts the probability that a new text sample has positive or negative sentiment (probability score between 0 and 1). You can train the model with two classes of training data (positive and negative) that are labelled with 1 and -1 labels. You can specify that label=1 is positive and label=-1 is negative, or you can specify that label=1 is negative and label=-1 is positive. The model will learn the notation that you choose.

The result returned by the model is always the probability that the label of the new event is 1. Depending on your label semantics, this probability could mean that the sentiment is positive, or this probability could mean that the sentiment is negative:

  • If you specified that label=1 is positive, the probabilities returned by the model can be interpreted as probability that the sentiment is positive.
  • If you specified that label=1 is negative, the probabilities returned by the model can be interpreted as probability that the sentiment is negative.

A threshold should be applied to the Sentiment Analysis probability score to classify each sample as positive, negative, or neutral. For example, assuming the model is trained with label=1 as positive, the following thresholds can typically be applied for each class of labels:

  • < 0.25 can be applied to label negative sentiment
  • 0.25 < p < 0.75 can be applied to label neutral sentiment
  • > 0.75 can be applied to label positive sentiment

This function requires both "input" and "label" fields in the incoming data stream. Neither appear in the Streaming ML user interface as they are not configurable. For more information, see the Required arguments section.

Function Input/Output Schema

Function Input
collection<record<R>>
This function takes in collections of records with schema R.
Function Output
collection<record<S>>
This function outputs collections of records with schema S.

Syntax

analyze_sentiment
value="input"

Required arguments

input
Syntax: string
Description: Default input column containing free text to analyze using Sentiment Analysis such as text, reviews, and tweets. This argument does not appear in the Streaming ML user interface because it is not configurable. If the dataset contains a column titled "input" it will be used by default. To override this field, use the optional argument for "value".
label
Syntax: double
Description: A column is required in the input dataset that is titled "label". This argument is not configurable. If the dataset contains labels, the values of the label field should be "+1" or "-1". The "+1" or "-1" will correspond to being positive or negative depending on the semantic notation you implement. If the dataset is unlabelled the label field should be "0".

Optional arguments

value
Syntax: string
Description: Set the "value" argument to apply Sentiment Analysis to different data to that from the "input" argument. The "value" argument overrides "input" when used.

Usage

For each observed data point, Sentiment Analysis computes and outputs a probability score between 0 and 1. The closer the probability score is to 1, the more likely the sentiment is negative.

A threshold should be applied to the Sentiment Analysis probability score to classify each sample as positive, negative, or neutral. For example, the following thresholds can typically be applied for each class of labels:

  • < 0.25 can be applied to label positive sentiment
  • 0.25 < p < 0.75 can be applied to label neutral sentiment
  • > 0.75 can be applied to label negative sentiment

Sentiment Analysis is useful to monitor the overall feeling and sentiment of text over time. Detecting sudden changes in overall sentiment may indicate an important shift in user behavior or customer satisfaction. It is possible to apply downstream operators, like Drift Detection or Anomaly Detection, on the output of the Sentiment Analysis model.

For example, it is possible to identify shifts in average sentiment in a stream of user feedback by applying the Drift Detection model to the average sentiment score on a rolling window. It is also possible to identify outliers that represent extremely negative or extremely positive reviews by applying an anomaly detection model, like Adaptive Thresholding, to the numeric stream stream of sentiment scores

SPL2 example

The following example uses Sentiment Analysis on review text:

| from splunk_firehose() 
| eval json=cast(body, "string"), 
	'json-map'=from_json_object(json), 
	input=ucast(map_get('json-map', "reviewText"), "string", null),
	label1=ucast(map_get('json-map', "label"), "integer", null),
	label=cast(label1, "double"),
	key="" 
| analyze_sentiment value="input";
Last modified on 25 January, 2021
PREVIOUS
Select
  NEXT
Sequential Outlier Detection (beta)

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters