Sentiment Analysis generates a label on unstructured text input using natural language processing. The Sentiment Analysis function classifies raw text as positive, negative, or neutral. Raw text input can include data streams such as messages, customer reviews, IT tickets, or computer logs. The function predicts the classified output label on observed text samples in real time on the stream. The user may provide optional sentiment labels, which can be used to improve the model incrementally as each new labelled example is ingested and observed over time.
Sentiment Analysis is useful for downstream processes such as flagging negative customer reviews for contact by a customer service representative in a customer service application.
Function Input/Output Schema
- Function Input
- This function takes in collections of records with schema R.
- Function Output
- This function outputs collections of records with schema S.
The required fields are in bold.
| analyze_sentiment value="input"
- Syntax: string
- Description: Name of column containing the free text. For example, reviews, tweets etc.
- Example: ucast(map_get('json-map', "reviewText"), "string", null)
- Syntax: double
- Description: If the label is given (-1 for negative and 1 for positive), the labeled text is used to update the model. If the label is 0 or not present, the model makes an inference for the sentiment of the text without making any change to the model.
- Example: cast(label1, "double")
For each observed data point, Sentiment Analysis computes and outputs a probability score between 0 and 1. The closer the probability score is to 1, the more likely the sentiment is negative.
A threshold should be applied to the Sentiment Analysis probability score to classify each sample as positive, negative, or neutral. For example, the following thresholds can typically be applied for each class of labels:
- < 0.25 can be applied to label positive sentiment
- 0.25 < p < 0.75 can be applied to label neutral sentiment
- > 0.75 can be applied to label negative sentiment
Sentiment Analysis is useful to monitor the overall feeling and sentiment of text over time. Detecting sudden changes in overall sentiment may indicate an important shift in user behavior or customer satisfaction. It is possible to apply downstream operators, like Drift Detection or Anomaly Detection, on the output of the Sentiment Analysis model.
For example, it is possible to identify shifts in average sentiment in a stream of user feedback by applying the Drift Detection model to the average sentiment score on a rolling window. It is also possible to identify outliers that represent extremely negative or extremely positive reviews by applying an anomaly detection model, like Adaptive Thresholding, to the numeric stream stream of sentiment scores
The following example uses Sentiment Analysis on review text:
| from splunk_firehose() | eval json=cast(body, "string"), 'json-map'=from_json_object(json), input=ucast(map_get('json-map', "reviewText"), "string", null), label1=ucast(map_get('json-map', "label"), "integer", null), label=cast(label1, "double"), key="" | analyze_sentiment value="input";
Sequential Outlier Detection
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0