Splunk® Enterprise

Search Reference

Download manual as PDF

Splunk Enterprise version 5.0 reached its End of Life on December 1, 2017. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

cluster

You can use the cluster command to learn more about your data and to find common and/or rare events in your data. For example, if you are investigating an IT problem and you don't know specifically what to look for, use the cluster command to find anomalies. In this case, anomalous events are those that aren't grouped into big clusters or clusters that contain few events. Or, if you are searching for errors, use the cluster command to see approximately how many different types of errors there are and what types of errors are common in your data.

Synopsis

Cluster similar events together.

Syntax

cluster [slc-option]...

Optional arguments

slc-option
Syntax: t=<num> | delims=<string> | showcount=<bool> | countfield=<field> | labelfield=<field> | field=<field> | labelonly=<bool> | match=(termlist | termset | ngramset)
Description: Options for configuring simple log clusters (slc).

SLC options

t
Syntax: t=<num>
Description: Sets the cluster threshold, which controls the sensitivity of the clustering. This value needs to be a number greater than 0.0 and less than 1.0. The closer the threshold is to 1, the more similar events have to be for them to be considered in the same cluster. Default is 0.8.
delims
Syntax: delims=<string>
Description: Configures the set of delimiters used to tokenize the raw string. By default, everything except 0-9, A-Z, a-z, and '_' are delimiters.
showcount
Syntax: showcount=<bool>
Description: Shows the size of each cluster. Default is true, unless labelonly is set to true. When showcount=false, each indexer clusters its own events before clustering on the search head.
countfield
Syntax: countfield=<field>
Description: Name of the field to write the cluster size to. The cluster size is the count of events in the cluster. Defaults to cluster_count.
labelfield
Syntax: labelfield=<field>
Description: Name of the field to write the cluster number to. Splunk counts each cluster and labels each with a number as it groups events into clusters. Defaults to cluster_label.
field
Syntax: field=<field>
Description: Name of the field to analyze in each event. Defaults to _raw.
labelonly
Description: labelonly=<bool>
Syntax: Controls whether to preserve incoming events and merely add the cluster fields to each event (labelonly=t) or output only the cluster fields as new events (labelonly=f). Defaults to false.
match
Syntax: match=(termlist | termset | ngramset)
Description: Specify the method used to determine the similarity between events. termlist breaks down the field into words and requires the exact same ordering of terms. termset allows for an unordered set of terms. ngramset compares sets of trigram (3-character substrings). ngramset is significantly slower on large field values and is most useful for short non-textual fields, like punct. Defaults to termlist.

Description

The cluster command groups events together based on how similar they are to each other. Unless you specify a different field, cluster uses the _raw field to break down the events into terms (match=termlist) and compute the vector between events. Set a higher threshold value for t, if you want the command to be more discriminating about which events are grouped together.

The result of the cluster command appends two new fields to each event. You can specify what to name these fields with the countfield and labelfield parameters, which default to cluster_count and cluster_label. The cluster_count value is the number of events that are part of the cluster, or the cluster size. Each event in the cluster is assigned the cluster_label value of the cluster it belongs to. For example, if the search returns 10 clusters, then the clusters are labeled from 1 to 10.

Examples

Example 1

Quickly return a glimpse of anything that is going wrong in your Splunk instance.

index=_internal source=*splunkd.log* log_level!=info | cluster showcount=t | table cluster_count _raw | sort -cluster_count

This search takes advantage of what Splunk logs about itself in the _internal index. It returns all logs where the log_level is DEBUG, WARN, ERROR, FATAL and clusters them together and sorts it by the count of events in each cluster.

Cluster example1.png

Example 2

Search for events that don't cluster into large groups.

... | cluster showcount=t | sort cluster_count

This returns clusters of events and uses the sort command to display them in ascending order based on the cluster size, which are the values of cluster_count. Because they don't cluster into large groups, you can consider these rare or uncommon events.

Example 3

Cluster similar error events together and search for the most frequent type of error.

error | cluster t=0.9 showcount=t | sort - cluster_count | head 20

This searches your index for events that include the term "error" and clusters them together if they are similar. The sort command is used to display the events in descending order based on the cluster size, cluster_count, so that largest clusters are shown first. The head command is then used to show the twenty largest clusters. Now that you've found the most common types of errors in your data, you can dig deeper to find the root causes of these errors.

Example 4

Use the cluster command to see an overview of your data. If you have a large volume of data, run the following search over a small time range, such as 15 minutes or 1 hour, or restrict it to a source type or index.

... | cluster labelonly=t showcount=t | sort - cluster_count, cluster_label, _time | dedup 5 cluster_label

This search helps you to learn more about your data by grouping events together based on their similarity and showing you a few of events from each cluster. It uses labelonly=t to keep each event in the cluster and append them with a cluster_label. The sort command is used to show the results in descending order by its size (cluster_count), then its cluster_label, then the indexed timestamp of the event (_time). The dedup command is then used to show the first five events in each cluster, using the cluster_label to differentiate between each cluster.

See also

anomalies, anomalousvalue, cluster, kmeans, outlier

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the cluster command.

PREVIOUS
chart
  NEXT
collect

This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7, 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18, 6.0, 6.0.1, 6.0.2


Comments

something is wrong with this command, especially the cluster_count field is incorrect when labelonly=f. <br /><br />index=_internal source=*splunkd.log* log_level!=info |search "/app/splunk/etc/apps/lea-loggrabber-splunk/bin/CKP_shmem_._authkeys" | cluster| sort cluster_count<br /><br />returns:<br />05-09-2014 17:00:02.808 -0400 WARN Archiver - Unable to add entry: /app/splunk/etc/apps/lea-loggrabber-splunk/bin/CKP_shmem_._authkeys.C to archive: /app/splunk/var/run/splunkapp-prod02-1399669202.bundle.candidate due to: Permission denied<br /><br /> cluster_count = 1<br /> cluster_label = 1<br /> host = splunkapp-prod02<br /> source = /app/splunk/var/log/splunk/splunkd.log<br /> sourcetype = splunkd<br /><br />However, if I change the search to labelonly=true:<br />you can see there are 20 events.

Kundeng
May 9, 2014

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters