Performance best practices
When your raw HDFS data is subjected to the search process, the data passes through index-time processing. (Index time extractions run at search time and cannot be turned off.)
In order to more efficiently process this data, you should optimize your index-time settings, particularly timestamping and aggregation. The following settings added to your data source in props.conf can be configured to improve performance:
DATETIME_CONFIG
MAX_TIMESTAMP_LOOKAHEAD
TIME_PREFIX
TIME_FORMAT
SHOULD_LINEMERGE
ANNOTATE_PUNCT
For example, for single line, non-timestamped data, the following settings can improve throughput roughly four times over:
[source::MyDataSource] ANNOTATE_PUNCT = false SHOULD_LINEMERGE = false DATETIME_CONFIG = NONE
Note: If you need to use timestamping, we strongly recommend that you use TIME_PREFIX
and TIME_FORMAT
to improve processing.
The table below shows examples of possible timestamping and breaking options and how long (in seconds) that combination can take when processing a file with 10 million single line events:
Timestamping and breaking options: | Time: |
---|---|
Default configuration |
190 seconds |
|
179 |
|
105 |
|
107 |
|
51 |
|
53 |
|
44 |
|
109 |
|
99 |
|
54 |
|
54 |
|
49 |
|
50 |
|
35 |
Troubleshoot Hunk | Provider Configuration Variables |
This documentation applies to the following versions of Hunk®(Legacy): 6.0, 6.0.1, 6.0.2, 6.0.3, 6.1, 6.1.1, 6.1.2, 6.1.3, 6.2, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.4.11
Feedback submitted, thanks!