Splunk® Enterprise

Splunk Analytics for Hadoop

Download manual as PDF

Download topic as PDF

Performance best practices

When your raw HDFS data is subjected to the search process, the data passes through index-time processing. (Index time extractions run at search time and cannot be turned off.)

In order to more efficiently process this data, you should optimize your index-time settings, particularly timestamping and aggregation. The following settings added to your data source in props.conf can be configured to improve performance:

  • DATETIME_CONFIG
  • MAX_TIMESTAMP_LOOKAHEAD
  • TIME_PREFIX
  • TIME_FORMAT
  • SHOULD_LINEMERGE
  • ANNOTATE_PUNCT

For example, for single line, non-timestamped data, the following settings can improve throughput roughly four times over:

[source::MyDataSource]
ANNOTATE_PUNCT   = false
SHOULD_LINEMERGE = false
DATETIME_CONFIG  = NONE

Note: If you need to use timestamping, we strongly recommend that you use TIME_PREFIX and TIME_FORMAT to improve processing.

The table below shows examples of possible timestamping and breaking options and how long (in seconds) that combination can take when processing a file with 10 million single line events:

Timestamping and breaking options: Time:

Default configuration

190 seconds

MAX_TIMESTAMP_LOOKAHEAD = 30

179

MAX_TIMESTAMP_LOOKAHEAD = 30
SHOULD_LINEMERGE = false

105

MAX_TIMESTAMP_LOOKAHEAD = 30
SHOULD_LINEMERGE = false
TIME_PREFIX = ^

107

MAX_TIMESTAMP_LOOKAHEAD = 30
SHOULD_LINEMERGE = false
TIME_FORMAT = %a, %d %b %Y %H:%M:%S %Z

51

MAX_TIMESTAMP_LOOKAHEAD = 30
SHOULD_LINEMERGE = false
TIME_PREFIX = ^
TIME_FORMAT = %a, %d %b %Y %H:%M:%S %Z

53

MAX_TIMESTAMP_LOOKAHEAD = 30
SHOULD_LINEMERGE = false
TIME_FORMAT = %a, %d %b %Y %H:%M:%S %Z
ANNOTATE_PUNCT = false

44

SHOULD_LINEMERGE = false

109

SHOULD_LINEMERGE = false
TIME_PREFIX = ^

99

SHOULD_LINEMERGE = false
TIME_FORMAT = %a, %d %b %Y %H:%M:%S %Z

54

SHOULD_LINEMERGE = false
TIME_PREFIX = ^
TIME_FORMAT = %a, %d %b %Y %H:%M:%S %Z

54

MAX_TIMESTAMP_LOOKAHEAD = 30
SHOULD_LINEMERGE = false
DATETIME_CONFIG = NONE

49

SHOULD_LINEMERGE = false
DATETIME_CONFIG = CURRENT

50

MAX_TIMESTAMP_LOOKAHEAD = 30
SHOULD_LINEMERGE = false
DATETIME_CONFIG = NONE
ANNOTATE_PUNCT = false

35

Disable streaming to speed up searches

If you want data only from MapReduce jobs, without previews, you can disable the Streaming feature of Splunk Analytics for Hadoop to speed up searches.

By default Splunk Analytics for Hadoop uses Mix Mode, which combining of Streaming (Splunk only) and Reporting (Hadoop MR jobs) modes. If you do not require a preview, you can disable the streaming part of Splunk Anaytics for Hadoop.

To enable or disable streaming:

  • Mix Mode: vix.mode = report and vix.splunk.search.mixedmode = 1
  • Report Mode only: vix.mode = report and vix.splunk.search.mixedmode = 0
  • Streaming Mode only: vix.mode = stream
Last modified on 07 August, 2019
PREVIOUS
Troubleshoot Splunk Analytics for Hadoop
  NEXT
Provider Configuration Variables

This documentation applies to the following versions of Splunk® Enterprise: 6.5.0, 6.5.1, 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.5.8, 6.5.9, 6.5.10, 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 6.6.6, 6.6.7, 6.6.8, 6.6.9, 6.6.10, 6.6.11, 6.6.12, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters