Performs k-means clustering on selected fields.
kmeans [kmeans-options]* <field-list>
- Syntax: <field>, ...
- Description: Specify the exact fields to use for the join. If none are specified, uses all numerical fields that are common to both result sets. Skips events with non-numerical fields.
- Syntax: <reps>|<iters>|<tol>|<k>|<cnumfield>|<distype>
- Description: Options for the
- Syntax: reps=<int>
- Description: Specify the number of times to repeat kmeans using random starting clusters. Defaults to 10.
- Syntax: maxiters=<int>
- Description: Specify the maximum number of iterations allowed before failing to converge. Defaults to 10000.
- Syntax: t=<num>
- Description: Specify the algorithm convergence tolerance. Defaults to 0.
- Syntax: k=<int> | <int>-<int>
- Description: Specify as a scalar integer value or a range of integers. When provided as single number, selects the number of clusters to use. This produces events annotated by the cluster label. When expressed as a range, clustering is done for each of the cluster counts in the range and a summary of the results is produced. These results express the size of the clusters, and a 'distortion' field which represents how well the data fits those selected clusters. Values must be greater than 1 and less than maxkvalue (see Limits section). Defaults to 2.
- Syntax: cfield=<field>
- Description: Names the field to annotate the results with the cluster number for each event. Defaults to CLUSTERNUM.
- Syntax: dt=l1 | l1norm | cityblock | cb | l2 | l2norm | sq | sqeuclidean | cos | cosine
- Description: Specify the distance metric to use.
l1, l1norm, and cbare synonyms for to
l2, l2norm, and sqare synonyms for
cosis a synonym for
cosine. Defaults to
Performs k-means clustering on select fields (or all numerical fields if empty). Events in the same cluster will be moved next to each other. Optionally the cluster number for each event is displayed.
The number of clusters to collect the values into -- k -- is not permitted to exceed maxkvalue, specified in limits.conf in the [kmeans] stanza. This defaults to 1000.
When a range is given for the
k option, the total distance between the begin and end cluster counts is not permitted to exceed maxkrange, specified in limits.conf in the [kmeans] stanza. This defaults to 100.
The above limits are designed to avoid the computation work becoming unreasonably expensive.
The total number of values which are clustered by the algorithm (typically the number of input results) is limited by the
maxdatapoints parameter in the
[kmeans] stanza of
limits.conf. If this limit is exceeded at runtime, a warning message displays in Splunk Web. This defaults to 100000000 or 100 million. This
maxdatapoints limit is designed to avoid exhausting memory.
Example 1: Group search results into 4 clusters based on the values of the "date_hour" and "date_minute" fields.
... | kmeans k=4 date_hour date_minute
Example 2: Group results into 2 clusters based on the values of all numerical fields.
... | kmeans
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the kmeans command.
This documentation applies to the following versions of Splunk® Enterprise: 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.0.9, 6.0.10, 6.0.11, 6.0.12, 6.0.13, 6.0.14, 6.0.15, 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.1.11, 6.1.12, 6.1.13, 6.1.14, 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.2.14, 6.2.15