Sentiment Analysis (beta)
Sentiment Analysis generates a label on unstructured text input using natural language processing (NLP). The Sentiment Analysis function classifies raw text as positive, negative, or neutral. Raw text input can include data streams such as messages, customer reviews, IT tickets, or computer logs. Sentiment Analysis is useful for downstream processes such as flagging negative customer reviews for contact by a customer service representative in a customer service application.
Sentiment Analysis predicts the sentiment of text samples in real time on the stream. You must provide labeled data (label = 1 or -1) to train a model. The model will predict the sentiment for each new example it encounters. In addition, if the data is labeled, the model will use the label provided to incrementally improve over time. If the data is unlabeled (label = 0), the model will not be updated.
The model predicts the probability that a new text sample has positive or negative sentiment (probability score between 0 and 1). You can train the model with two classes of training data (positive and negative) that are labelled with 1 and -1 labels. You can specify that label=1 is positive and label=-1 is negative, or you can specify that label=1 is negative and label=-1 is positive. The model will learn the notation that you choose.
The result returned by the model is always the probability that the label of the new event is 1. Depending on your label semantics, this probability could mean that the sentiment is positive, or this probability could mean that the sentiment is negative:
- If you specified that label=1 is positive, the probabilities returned by the model can be interpreted as probability that the sentiment is positive.
- If you specified that label=1 is negative, the probabilities returned by the model can be interpreted as probability that the sentiment is negative.
A threshold should be applied to the Sentiment Analysis probability score to classify each sample as positive, negative, or neutral. For example, assuming the model is trained with label=1 as positive, the following thresholds can typically be applied for each class of labels:
- < 0.25 can be applied to label negative sentiment
- 0.25 < p < 0.75 can be applied to label neutral sentiment
- > 0.75 can be applied to label positive sentiment
This function requires both "input" and "label" fields in the incoming data stream. Neither appear in the Streaming ML user interface as they are not configurable. For more information, see the Required arguments section.
Function Input/Output Schema
- Function Input
collection<record<R>>
- This function takes in collections of records with schema R.
- Function Output
collection<record<S>>
- This function outputs collections of records with schema S.
Syntax
- analyze_sentiment
- value="input"
Required arguments
- input
- Syntax: string
- Description: Default input column containing free text to analyze using Sentiment Analysis such as text, reviews, and tweets. This argument does not appear in the Streaming ML user interface because it is not configurable. If the dataset contains a column titled "input" it will be used by default. To override this field, use the optional argument for "value".
- label
- Syntax: double
- Description: A column is required in the input dataset that is titled "label". This argument is not configurable. If the dataset contains labels, the values of the label field should be "+1" or "-1". The "+1" or "-1" will correspond to being positive or negative depending on the semantic notation you implement. If the dataset is unlabelled the label field should be "0".
Optional arguments
- value
- Syntax: string
- Description: Set the "value" argument to apply Sentiment Analysis to different data to that from the "input" argument. The "value" argument overrides "input" when used.
Usage
For each observed data point, Sentiment Analysis computes and outputs a probability score between 0 and 1. The closer the probability score is to 1, the more likely the sentiment is negative.
A threshold should be applied to the Sentiment Analysis probability score to classify each sample as positive, negative, or neutral. For example, the following thresholds can typically be applied for each class of labels:
- < 0.25 can be applied to label positive sentiment
- 0.25 < p < 0.75 can be applied to label neutral sentiment
- > 0.75 can be applied to label negative sentiment
Sentiment Analysis is useful to monitor the overall feeling and sentiment of text over time. Detecting sudden changes in overall sentiment may indicate an important shift in user behavior or customer satisfaction. It is possible to apply downstream operators, like Drift Detection or Anomaly Detection, on the output of the Sentiment Analysis model.
For example, it is possible to identify shifts in average sentiment in a stream of user feedback by applying the Drift Detection model to the average sentiment score on a rolling window. It is also possible to identify outliers that represent extremely negative or extremely positive reviews by applying an anomaly detection model, like Adaptive Thresholding, to the numeric stream stream of sentiment scores
SPL2 example
The following example uses Sentiment Analysis on review text:
| from splunk_firehose() | eval json=cast(body, "string"), 'json-map'=from_json_object(json), input=ucast(map_get('json-map', "reviewText"), "string", null), label1=ucast(map_get('json-map', "label"), "integer", null), label=cast(label1, "double"), key="" | analyze_sentiment value="input";
Select | Sequential Outlier Detection (beta) |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5
Feedback submitted, thanks!