collect
Description
Adds the results of a search to a summary index that you specify. You must create the summary index before you invoke the collect
command.
You do not need to know how to use collect
to create and use a summary index, but it can help. For an overview of summary indexing, see Use summary indexing for increased search efficiency in the Knowledge Manager Manual.
This command is considered risky because, if used incorrectly, it can pose a security risk or potentially lose data when it runs. As a result, this command triggers SPL safeguards. See SPL safeguards for risky commands in Securing the Splunk Platform.
Syntax
collect index=<string> [<arg-options>...]
Required arguments
- index
- Syntax: index=<string>
- Description: Name of the summary index where the events are added. The index must exist before the events are added. The index is not created automatically.
Optional arguments
- arg-options
- Syntax: addinfo=<bool> | addtime=<bool> | file=<string> | spool=<bool> | marker=<string> | output_format [raw | hec] | testmode=<bool> | run_in_preview=<bool> | host=<string> | source=<string> | sourcetype=<string>
- Description: Optional arguments for the collect command. See the arg-options section for the descriptions for each option.
arg-options
- addinfo
- Syntax: addinfo=<bool>
- Description: Use this option to specify whether to prefix search time and time-range information fields on to each summary index event. If set to
true
, adds fields to each event in the following format:
info_min_time=<search_earliest_time>, info_max_time=<search_latest_time>, info_search_time=<search_exec_time>
- Default: True when summary events are destined for an events index or when
output_format=raw
. False when summary events are destined for a metrics index.
- addtime
- Syntax: addtime=<bool>
- Description: Use this option to specify whether to prefix a time field on to each event. Some commands return results that do not have a
_raw
field, such as the stats, chart, timechart commands. If you specifyaddtime=false
, the Splunk software uses its generic date detection against fields in whatever order they happen to be in the summary rows. If you specifyaddtime=true
, the Splunk software uses the search time rangeinfo_min_time
. This time range is added by the sistats command or_time
. Splunk software adds the time field based on the first field that it finds:info_min_time
,_time
, ornow()
. - This option is not valid when
output_format=hec
. - Default: True when summary events are destined for an events index. False when summary events are destined for a metrics index.
- file
- Syntax: file=<string>
- Description: The file name where you want the events to be written. You can use a timestamp or a random number for the file name by specifying either file=$timestamp$ or file=$random$.
- Usage: ".stash" needs to be added at the end of the file name when used with "index=". Otherwise, the data is added to the main index.
- Default: <random-number>_events.stash
- host
- Syntax: host=<string>
- Description: The name of the host that you want to specify for the events.
- This option is not valid when
output_format=hec
.
- marker
- Syntax: marker=<string>
- Description: A string, usually of key-value pairs, to append to each event written out. Each key-value pair must be separated by a comma and a space.
- If the value contains spaces or commas, it must be escape quoted. For example if the key-value pair is
search_name=vpn starts and stops
, you must change it tosearch_name=\"vpn starts and stops\"
. - This option is not valid when
output_format=hec
.
- output_format
- Syntax: output_format=[raw | hec]
- Description: Specifies the output format for the summary indexing. If set to
raw
, uses the traditional non-structured log style summary indexing stash output format. - If set to
hec
, it generates HTTP Event Collector (HEC) JSON formatted output:- All fields are automatically indexed when the stash file is indexed.
- The file that is written to the
var/spool/splunk
path ends in.stash_hec
instead of.stash
. - Allows the source, sourcetype, and host from the original data to be used directly in the summary index. Does not re-map these fields to the
extract_host/extracted_sourcetype/...
path. - The
index
andsplunk_server
fields in the original data are ignored. - You cannot use the
addtime
,host
,marker
,source
, or thesourcetype
options whenoutput_format=hec
.
- Default: raw
- run_in_preview
- Syntax: run_in_preview=<bool>
- Description: Controls whether the
collect
command is enabled during preview generation. Generally, you do not want to insert preview results into the summary index, run-in-preview=false. In some cases, such as when a custom search command is used as part of the search, you might want to turn this on to ensure correct summary indexable previews are generated. - Default: false
- spool
- Syntax: spool=<bool>
- Description: If set to true, the summary indexing file is written to the Splunk spool directory, where it is indexed automatically. If set to false, the file is written to the
$SPLUNK_HOME/var/run/splunk
directory. The file remains in this directory unless some form of further automation or administration is done. If you have Splunk Enterprise, you can use this command to troubleshoot summary indexing by dumping the output file to a location on disk where it will not be ingested as data. - Default: true
- source
- Syntax: source=<string>
- Description: The name of the source that you want to specify for the events.
- This option is not valid when
output_format=hec
.
- sourcetype
- Syntax: sourcetype=<string>
- Description: The name of the source type that you want to specify for the events. If you specify a source type other than stash, the ingested summary data will count against your license usage.
- This option is not valid when
output_format=hec
. - Default: stash
- testmode
- Syntax: testmode=<bool>
- Description: Toggle between testing and real mode. In testing mode the results are not written into the new index but the search results are modified to appear as they would if sent to the index.
- Default: false
Usage
The events are written to a file whose name format is: random-num_events.stash, unless overwritten, in a directory that your Splunk deployment is monitoring. If the events contain a _raw
field, then this field is saved. If the events do not have a _raw
field, one is created by concatenating all the fields into a comma-separated list of key=value pairs.
The collect
command also works with real-time searches that have a time range of All time.
Events without timestamps
If you apply the collect
command to events that do not have timestamps, the command designates a time for all of the events using the earliest (or minimum) time of the search range. For example, if you use the collect
command over the past four hours (range: -4h to +0h), the command assigns a timestamp that is four hours prior to the time that the search was launched. The timestamp is applied to all of the events without a timestamp.
If you use the collect
command with a time range of All time and the events do not have timestamps, the current system time is used for the timestamps.
For more information on summary indexing of data without timestamps, see Use summary indexing for increased reporting efficiency in the Knowledge Manager Manual.
Copying events to a different index
You can use the collect
command to copy search results to another index.
Construct a search that returns the data you want to copy, and pipe the results to the collect
command. For example:
index=foo | ... | collect index=bar
This search writes the results into the bar
index. The sourcetype is changed to stash
.
You can specify a sourcetype with the collect
command. However, specifying a sourcetype counts against your license, as if you indexed the data again.
Examples
1. Put "download" events into an index named "download count"
eventtypetag="download" | collect index=downloadcount
2. Collect statistics on VPN connects and disconnects
You want to collect hourly statistics on VPN connects and disconnects by country.
index=mysummary
| geoip REMOTE_IP
| eval country_source=if(REMOTE_IP_country_code="US","domestic","foreign")
| bin _time span=1h
| stats count by _time,vpn_action,country_source
| addinfo
| collect index=mysummary marker="summary_type=vpn, summary_span=3600,
summary_method=bin, search_name=\"vpn starts and stops\""
The addinfo
command ensures that the search results contain fields that specify when the search was run to populate these particular index values.
See also
cofilter | concurrency |
This documentation applies to the following versions of Splunk Cloud Platform™: 8.2.2201, 8.2.2112
Feedback submitted, thanks!