Configure streams to use content extraction
You can use content extraction rules to capture and index a subset of data from a protocol string field. You can also use content extraction to generate MD5 or SHA-512 hashes or hexadecimal numbers of non-numeric fields.
Content Extraction Rules
Content extraction rules use regular expressions to extract sections of data from a parent field. This lets you capture only the specific pieces of data that you require for analysis, without indexing extraneous data.
Some string fields, such as src_content
, dest_content
, and cookies
, contain long strings of data. In many cases, the entire string of data is not needed. Using content extraction rules, you can limit data capture to specific pieces of data, such as a name, ID, or account number.
You can specify a capturing group match, which outputs either the first value that matches the regular expression, or the "list" of all values that match the regular expression. Each content extraction rule creates a new field that captures only that data specified by the rule. The original field is not modified.
You can also hash the extracted values using SHA-512.
You can create four types of content extractions:
- Regex
- MD5 hash
- Hexadecimal
- SHA-512 hash
About extracting fields as hashes or hexadecimal encoding
You can use content extraction to generate an MD5 or SHA-512 hash or hexadecimal encoding of any non-numeric field for any protocol. Hashes are useful for masking sensitive data in search results, such as user names, passwords, and other important account information. Hexadecimal encoding is useful for representing arbitrary binary data that can interfere with the Splunk search UI.
You can also use file hashing to detect if a specific file is being transmitted over your network without storing the entire contents of the file, which might be quite large. For example, you might store the hash of a file (such as a sensitive document or piece of malware), then compare that hash to the hash of email attachments you capture to see if it matches.
Each field that you extract as a hash or hexadecimal generates a new field. The original field does not need to be enabled. This lets you view a secure fingerprint of the field value in search results without exposing the original field value.
An MD5 hash is 32 characters long by default. For additional security, you can truncate MD5 hash values to any number of characters, and specify an offset from the 1st character at left. The default hash length for SHA-512 is 64.
Create Content Extraction Rules in Splunk Web
You can create the following extraction types:
- Regex
- MD5 hash
- SHA-512 hash
- Hexadecimal
Create a content extraction rule that uses Regex
- In the Configure Streams UI, click on the stream that contains the field from which you want to extract content.
- Click the Actions menu for the particular field, then select Extract New Field.
Note that the original field does not need to be enabled. If you want to also index the original field, consider using a field transformation. For more information, see field transformations in the Knowledge Manager Manual. - Enter a Name for the content extraction rule, such as "readable_cookie."
- Enter a Description for the content extraction rule. For example, "Separate the cookies into name/value pairs in a readable format."
- Select "Regex" as the Extraction Type:
- In the Extraction Rule field, enter the regular expression for the content that you want to extract. For example, we can use following regular expression to extract a name and value pair from the
cookie
field:(.*?)\=(.*?;)</code>
Note: Stream uses Boost Perl Regular Expression syntax. - For Match:
- Select First to return the first value that matches the regex.
- Select All to return the "list" of all values that match the regex
- In the Extraction Format text box, enter the format for the extraction. For example, enter
$1, $2
to return the first and second values that match the regex. - (Optional) Select Yes or No for Hash Extraction. Select Yes if you want to hash the field using SHA-512.
- (Optional) Specify the Hash Length. The hash length limit is 64 characters.
- (Optional) Specify the Hash Offset. The offset can be between 0 and 63.
- (Optional) Specify the Hash Salt. This can be any alphanumeric character, max 30 characters. You must be running the Splunk App for Stream and Splunk Stream forwarders version 7.3.0 or later to use this feature.
- Click Save.
The <MaxEventQueueSize>
option in streamfwd.conf
determines the maximum number of events that Splunk_TA_stream
can queue for delivery to Splunk indexers. By default <MaxEventQueueSize>
supports 10k events. To increase or decrease the maximum event queue size, modify the value of <MaxEventQueueSize>
in streamfwd.conf
. See Configure streamfwd.conf in the Splunk Stream Installation and Configuration Manual.
Extract a field as an MD5 Hash
To use custom-defined Salt, you must be running the Splunk App for Stream and Splunk Stream forwarders version 7.3.0 or later.
- Click the Actions menu for the particular field that you want to hash, then select Extract New Field.
- Enter a name for the MD5 hash field that you want generate. For example, "src_content_MD5_hash."
- Enter a description for the new MD5 or hash field. For example, "MD5 hash of src_content_field."
- In the Extraction Type menu, select MD5 hash.
- In the Hash Length field, specify the number of characters to use for the hash. Leave blank for default 32 characters.
- in the Hash Offset field, specify the number of characters to offset from 1st character at left. Leave blank for default of 0.
- Specify the Hash Salt. This can be any alphanumeric character, max 30 characters. You must be running the Splunk App for Stream and Splunk Stream forwarders version 7.3.0 or later to use this feature.
- Click Save.
The new field appears in the list of Fields on the events page for the particular stream.
MD5 hash content extraction is pre-configured for specific fields in HTTP and SMTP protocol streams.
Extract a field as an or SHA-512 Hash
To use the SHA-512 algorithm or to use custom-defined Salt, you must be running the Splunk App for Stream and Splunk Stream forwarders version 7.3.0 or later.
- Click the Actions menu for the particular field that you want to hash, then select Extract New Field.
- Enter a name for the SHA-512 hash field that you want generate. For example, "src_content_SHA512_hash."
- Enter a description for the new SHA-512 hash field. For example, "SHA-512 hash of src_content_field."
- In the Extraction Type menu, select SHA-512 hash.
- In the Hash Length field, specify the number of characters to use for the hash. Leave blank for default 64 characters.
- in the Hash Offset field, specify the number of characters to offset from 1st character at left. Leave blank for default of 0.
- Specify the Hash Salt. This can be any alphanumeric character, max 30 characters. To use this feature, you must be running the Splunk App for Stream and Splunk Stream forwarders version 7.3.0 or later.
- Click Save. The new field appears in the list of Fields on the events page for the particular stream.
Extract a field as a hexadecimal
- Click the Actions menu for the particular field that you want to extract, then select Extract New Field.
- Enter a name for the field that you want generate. For example, "src_content_hex".
- Enter a description for the new field. For example, "hexadecimal value of src_content_field."
- In the Extraction Type menu, select hexadecimal.
- Click Save. The new field appears in the list of Fields on the events page for the particular stream.
Use file extraction
Splunk Stream 7.1.0 and later supports file extraction from metadata streams. File extraction lets you capture files from network traffic, such as emails, email attachments, images, pdfs, and so on. You can identify extracted files in Splunk search results, and use workflow actions to download those files to your local machine.
File extraction supports http and smtp protocols only.
File extraction prerequisites
Before you can use file extraction to capture files with metadata streams, you must map your Splunk Stream deployment to a remote file server. The app uses the file server to store files extracted by Stream forwarder based on the metadata stream definition. See Configure file extraction in the Splunk Stream Installation and Configuration Manual.
Splunk Stream lets you capture network event data for a variety of network protocols. Make sure to consider your privacy and security obligations when selecting and using a remote file server for Splunk Stream data.
File extraction is not supported on Splunk Cloud.
Configure file extraction for a metadata stream
- Create or clone a new HTTP or SMTP metadata stream. See Configure metadata streams.
- In the Fields tab, enable one or both of the following fields:
file_extracted_req file_extracted_resp
For example, if you only need http uploads, enablefile_extracted_req
field. - Click Save.
The metadata stream now extracts files from request and response data. Extracted files appear in search results.
Search for extracted files
To search for extracted files:
- In the Splunk Search and Reporting app, in the Search bar, enter the following event type:
eventtype="stream_extractedfilesaved"
- In your search results, look for the
extracted_file []
multi-value field.
All extracted files in the network event appear in this field.
Download extracted files
To download extracted files that you identify in search results:
- Expand the Event tab.
- Find the
extracted_file{}
field in the expanded tab. - Click Actions > Downloaded Extracted File.
The extracted file downloads to your local machine.
Configure Streams to apply aggregation | Stream field details |
This documentation applies to the following versions of Splunk Stream™: 8.0.1, 8.0.2, 8.1.0, 8.1.1, 8.1.3
Feedback submitted, thanks!