
kmeans
Synopsis
Performs k-means clustering on selected fields.
Syntax
kmeans [kmeans-options]* <field-list>
Required arguments
- field-list
- Syntax: <field>, ...
- Description: Specify the exact fields to use for the join. If none are specified, uses all numerical fields that are common to both result sets. Skips events with non-numerical fields.
Optional arguments
- kmeans-options
- Syntax: <reps>|<iters>|<tol>|<k>|<cnumfield>|<distype>
- Description: Options for the
kmeans
command.
kmeans options
- reps
- Syntax: reps=<int>
- Description: Specify the number of times to repeat kmeans using random starting clusters. Defaults to 10.
- iters
- Syntax: maxiters=<int>
- Description: Specify the maximum number of iterations allowed before failing to converge. Defaults to 10000.
- t
- Syntax: t=<num>
- Description: Specify the algorithm convergence tolerance. Defaults to 0.
- k
- Syntax: k=<int> | <int>-<int>
- Description: Specify as a scalar integer value or a range of integers. When provided as single number, selects the number of clusters to use. This produces events annotated by the cluster label. When expressed as a range, clustering is done for each of the cluster counts in the range and a summary of the results is produced. These results express the size of the clusters, and a 'distortion' field which represents how well the data fits those selected clusters. Values must be greater than 1 and less than maxkvalue (see Limits section). Defaults to 2.
- cnumfield
- Syntax: cfield=<field>
- Description: Names the field to annotate the results with the cluster number for each event. Defaults to CLUSTERNUM.
- distype
- Syntax: dt=l1 | l1norm | cityblock | cb | l2 | l2norm | sq | sqeuclidean | cos | cosine
- Description: Specify the distance metric to use.
l1, l1norm, and cb
are synonyms for tocityblock
.l2, l2norm, and sq
are synonyms forsqeuclidean
.cos
is a synonym forcosine
. Defaults tosqeucildean
.
Description
Performs k-means clustering on select fields (or all numerical fields if empty). Events in the same cluster will be moved next to each other. Optionally the cluster number for each event is displayed.
Limits
The number of clusters to collect the values into -- k -- is not permitted to exceed maxkvalue, specified in limits.conf in the [kmeans] stanza. This defaults to 1000.
When a range is given for the k
option, the total distance between the begin and end cluster counts is not permitted to exceed maxkrange, specified in limits.conf in the [kmeans] stanza. This defaults to 100.
The above limits are designed to avoid the computation work becoming unreasonably expensive.
The total number of values which are clustered by the algorithm (typically the number of input results) is limited by the maxdatapoints
parameter in the [kmeans]
stanza of limits.conf
. If this limit is exceeded at runtime, a warning message displays in Splunk Web. This defaults to 100000000 or 100 million. This maxdatapoints
limit is designed to avoid exhausting memory.
Examples
Example 1: Group search results into 4 clusters based on the values of the "date_hour" and "date_minute" fields.
... | kmeans k=4 date_hour date_minute
Example 2: Group results into 2 clusters based on the values of all numerical fields.
... | kmeans
See also
anomalies, anomalousvalue, cluster, outlier,
Answers
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the kmeans command.
PREVIOUS join |
NEXT kvform |
This documentation applies to the following versions of Splunk® Enterprise: 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.0.9, 6.0.10, 6.0.11, 6.0.12, 6.0.13, 6.0.14, 6.0.15, 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.1.11, 6.1.12, 6.1.13, 6.1.14, 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.2.14, 6.2.15
Feedback submitted, thanks!