Splunk® Enterprise

Search Reference

Acrobat logo Download manual as PDF


Splunk Enterprise version 7.0 is no longer supported as of October 23, 2019. See the Splunk Software Support Policy for details. For information about upgrading to a supported version, see How to upgrade Splunk Enterprise.
Acrobat logo Download topic as PDF

kmeans

Description

Partitions the events into k clusters, with each cluster defined by its mean value. Each event belongs to the cluster with the nearest mean value. Performs k-means clustering on the list of fields that you specify. If no fields are specified, performs the clustering on all numeric fields. Events in the same cluster are moved next to each other. You have the option to display the cluster number for each event.

Syntax

kmeans [kmeans-options...] [field-list]

Required arguments

None.

Optional arguments

field-list
Syntax: <field> ...
Description: Specify a space separated list of the exact fields to use for the join.
Default: If no fields are specified, uses all numerical fields that are common to both result sets. Skips events with non-numerical fields.
kmeans-options
Syntax: <reps> | <iters> | <t> | <k> | <cnumfield> | <distype> | <showcentroid>
Description: Options for the kmeans command.

kmeans options

reps
Syntax: reps=<int>
Description: Specify the number of times to repeat kmeans using random starting clusters.
Default: 10
iters
Syntax: maxiters=<int>
Description: Specify the maximum number of iterations allowed before failing to converge.
Default: 10000
t
Syntax: t=<num>
Description: Specify the algorithm convergence tolerance.
Default: 0
k
Syntax: k=<int> | <int>-<int>
Description: Specify as a scalar integer value or a range of integers. When provided as single number, selects the number of clusters to use. This produces events annotated by the cluster label. When expressed as a range, clustering is done for each of the cluster counts in the range and a summary of the results is produced. These results express the size of the clusters, and a 'distortion' field which represents how well the data fits those selected clusters. Values must be greater than 1 and less than maxkvalue (see Limits section).
Default: k=2
cnumfield
Syntax: cfield=<field>
Description: Names the field to annotate the results with the cluster number for each event.
Default: CLUSTERNUM
distype
Syntax: dt= ( l1 | l1norm | cityblock | cb ) | ( l2 | l2norm | sq | sqeuclidean ) | ( cos | cosine )
Description: Specify the distance metric to use. The l1, l1norm, and cb distance metrics are synonyms for cityblock. The l2, l2norm, and sq distance metrics are synonyms for sqeuclidean or sqEuclidean. The cos distance metric is a synonym for cosine.
Default: sqeucildean
showcentroid
Syntax: showcentroid= true | false
Description: Specify whether to expose the centroid centers in the search results (showcentroid=true) or not.
Default: true

Usage

Limits

The number of clusters to collect the values into -- k -- is not permitted to exceed maxkvalue. The maxkvalue is specified in the limits.conf file, in the [kmeans] stanza. The maxkvalue default is 1000.

When a range is given for the k option, the total distance between the beginning and ending cluster counts is not permitted to exceed maxkrange. The maxkrange is specified in the limits.conf file, in the [kmeans] stanza. The maxkrange default is 100.

The above limits are designed to avoid the computation work becoming unreasonably expensive.

The total number of values which are clustered by the algorithm (typically the number of input results) is limited by the maxdatapoints parameter in the [kmeans] stanza of limits.conf. If this limit is exceeded at runtime, a warning message displays in Splunk Web. This defaults to 100000000 or 100 million. This maxdatapoints limit is designed to avoid exhausting memory.

Examples

Example 1: Group search results into 4 clusters based on the values of the "date_hour" and "date_minute" fields.

... | kmeans k=4 date_hour date_minute

Example 2: Group results into 2 clusters based on the values of all numerical fields.

... | kmeans

See also

anomalies, anomalousvalue, cluster, outlier,

Last modified on 21 July, 2020
PREVIOUS
join
  NEXT
kvform

This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.5, 8.0.10, 7.2.1, 7.0.1, 8.0.4, 8.0.9, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.2.0, 8.0.6, 8.0.7, 8.0.8


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters