Troubleshoot HTTP Event Collector
You can troubleshoot HTTP Event Collector (HEC) by viewing error logs. You can also set up logging using configuration files, investigate instance performance with dashboards included in the Monitoring Console, and detect other scaling problems.
Logging
HTTP Event Collector saves usage data about itself to log files. You can search these usage metrics using Splunk Cloud Platform or Splunk Enterprise to explore usage trends system-wide, per token, per source type, and more, as well as to evaluate HEC performance. Metrics are logged whenever HEC is active. HEC is disabled by default, so it does not log data until you enable it.
You can also view HEC error logs in the splunkd.log log file on Splunk Enterprise. See Enable debug logging in the Troubleshooting Manual for how to enable debugging on your Splunk Enterprise instance.
Log file location and management
Splunk Enterprise writes HTTP Event Collector metrics to the $SPLUNK_HOME/var/log/introspection/splunk/http_event_collector_metrics.log file.
The Splunk platform creates a new http_event_collector_metrics.log file when you log off of and back on to Splunk Cloud Platform or start your Splunk Enterprise instance. Any existing file with that name is renamed.
You configure the logging frequency of HTTP Event Collector metrics in the limits.conf configuration file. 60 seconds is the default frequency. HEC continues logging system-level metrics even when there is no data input activity. When there is no activity, you can expect about 200 kilobytes (KB) of metrics log data to be produced every 24 hours. The maximum size of a metrics log file is 25 megabytes (MB). If a log file reaches that limit, the Splunk platform renames the log file and creates a new file. Up to five metrics log files can be stored at a time.
The props.conf configuration file defines parameters for reading and indexing the metrics log file.
Searching HTTP Event Collector metrics data
The Splunk platform puts HEC metrics data into the _introspection
index. To search the accumulated HEC metrics with the Splunk platform, use the following search command:
index="_introspection" token
Metrics log data format
The Splunk platform records HEC metrics data to the log in JSON format. This means that the log is both human-readable and consistent with other Splunk Cloud Platform or Splunk Enterprise log formats. A single entry consists of both input summary metrics (series = http_event_collector
) and per-token metrics (series = http_event_collector_token
), as shown in the following example:
{ "datetime":"09-01-2016 19:21:19.014 -0700", "log_level":"INFO", "component":"HttpEventCollector", "data":{ "series":"http_event_collector", "transport":"http", "format":"json", "total_bytes_received":0, "total_bytes_indexed":0, "num_of_requests":0, "num_of_events":0, "num_of_errors":0, "num_of_parser_errors":0, "num_of_auth_failures":0, "num_of_requests_to_disabled_token":0, "num_of_requests_to_incorrect_url":0, "num_of_requests_in_mint_format":0, "num_of_ack_requests":0, "num_of_requests_acked":0, "num_of_requests_waiting_ack":0 } } { "datetime":"08-22-2016 12:38:04.854 -0700", "log_level":"INFO", "component":"HttpEventCollector", "data":{ "token_name":"test", "series":"http_event_collector_token", "transport":"http", "format":"json", "total_bytes_received":57000, "total_bytes_indexed":44000, "num_of_requests":1000, "num_of_events":1000, "num_of_errors":0, "num_of_parser_errors":0, "num_of_requests_to_disabled_token":0, "num_of_requests_in_mint_format":0 } }
HEC summary metrics
The Splunk platform accumulates system-wide summary metrics even if there is no input activity. These metrics are identified by "series":"http_event_collector"
.
See the following table for a description of the fields for HEC summary metrics:
Field | Description | Value |
---|---|---|
component | HTTP Event Collector metrics data identifier. | HttpEventCollector |
data:format | HTTP Event Collector data format. | json |
data:num_of_auth_failures | Total number of authentication failures due to invalid token. | unsigned integer |
data:num_of_errors | Total number of per-token errors, which include the following options:
|
unsigned integer |
data:num_of_events | Total number of per-token events received by the HTTP Event Collector endpoint. | unsigned integer |
data:num_of_parser_errors | Total number of per-token parser errors due to incorrectly formatted event data. | unsigned integer |
data:num_of_requests | Total number of valid per-token individual HTTP or HTTPS requests received by an HTTP Event Collector endpoint. Each request can have one or more data events. | unsigned integer |
data:num_of_ack_requests | Total number of HEC request indexer status queries received. | unsigned integer |
data:num_of_requests_acked | Total number of HEC requests that Splunk successfully indexed and acknowledged. | unsigned integer |
data:num_of_requests_waiting_ack | Total number of HEC requests received with indexer acknowledgements enabled. | unsigned integer |
data:num_of_requests_to_incorrect_url | Total number of requests to an incorrect URL. | unsigned integer |
data:num_of_requests_in_mint_format | Total number of requests from Splunk MINT. | unsigned integer |
data:num_of_requests_to_disabled_token | Total number of per-token requests to disable token. | unsigned integer |
data:series | Metrics data type. | http_event_collector |
data:total_bytes_indexed | Total amount of per-token data sent to the indexer. | unsigned integer |
data:total_bytes_received | Total amount of per-token data received by calling the receive/token endpoint.
|
unsigned integer |
data:transport | Data transport protocol for HTTP Event Collector data. | http |
datetime | Date and time associated with the data. Takes the following format: MM-DD-YYYY HH:MM:SS.SSS +/-GMTDELTA
|
string |
log_level | Log severity level. | INFO |
Per-token metrics
In contrast to the system-wide summary metrics, the Splunk platform accumulates per-token metrics only when HEC is active. These metrics are identified by "series":"http_event_collector_token"
.
The [http_input]
stanza in the limits.conf configuration file defines the logging interval and maximum number of tokens logged for these metrics.
See the following table for a description of the fields for per-token metrics:
Field | Description | Value |
---|---|---|
component | HTTP Event Collector metrics data identifier. | HttpEventCollector |
data:format | HTTP Event Collector data format. Always JSON format for metrics logging. | json |
data:num_of_errors | Number of errors, which include the following:
|
unsigned integer |
data:num_of_events | Number of events received by the HTTP Event Collector endpoint. | unsigned integer |
data:num_of_parser_errors | Number of parser errors due to incorrectly formatted event data. | unsigned integer |
data:num_of_requests | Number of valid individual HTTP or HTTPS requests received by an HTTP Event Collector endpoint. Each request can have one or more data events. | unsigned integer |
data:num_of_requests_in_mint_format | Total number of requests from Splunk MINT. | unsigned integer |
data:num_of_requests_to_disabled_token | Number of requests to a disabled token. | unsigned integer |
data:series | Metrics data type. | http_event_collector_token |
data:token_name | Token name. | string |
data:total_bytes_indexed | Total amount of data sent to the indexer. | unsigned integer |
data:total_bytes_received | Total amount of data received by calling the receive/token endpoint.
|
unsigned integer |
data:transport | Data transport protocol for HTTP Event Collector data. | http |
datetime | Date and time associated with the data. Takes the following format: MM-DD-YYYY HH:MM:SS.SSS +/-GMTDELTA
|
string |
log_level | Log severity level. | INFO |
Logging with configuration files
The limits.conf and props.conf files control metrics data logging and indexing behavior.
limits.conf
The [http_input]
stanza in the $SPLUNK_HOME/etc/system/default/limits.conf file controls HTTP Event Collector metrics data logging.
For information about all HTTP Event Collector-related parameters, including those not related to metrics, see the [http_input] stanza documentation on limits.conf in the Splunk Enterprise Admin Manual.
Limits.conf takes the following parameters:
Parameter | Default value | Description |
---|---|---|
max_number_of_tokens | 10000 | An unsigned integer that represents the maximum number of tokens reported by HTTP Event Collector metrics. |
metrics_report_interval | 60 | An unsigned integer that represents the number of seconds in an HTTP Event Collector metrics report interval. |
props.conf
The [http_event_collector_metrics]
stanza in the $SPLUNK_HOME/etc/system/default/props.conf file controls reading and indexing the HTTP Event Collector log files.
See the following example:
[source::.../http_event_collector_metrics.log(.\d+)?] sourcetype = http_event_collector_metrics ... [http_event_collector_metrics] SHOULD_LINEMERGE = false TIMESTAMP_FIELDS = datetime TIME_FORMAT = %m-%d-%Y %H:%M:%S.%l %z INDEXED_EXTRACTIONS = json KV_MODE = none JSON_TRIM_BRACES_IN_ARRAY_NAMES = true
Props.conf takes the following parameters:
Parameter | Default | Description |
---|---|---|
SHOULD_LINEMERGE | false | Specifies layout of events per line. Setting to true allows multiple events in the same line. Setting to false puts multiple events in separate lines. |
TIMESTAMP_FIELDS | datetime | Log entry time field name. |
TIME_FORMAT | %m-%d-%Y %H:%M:%S.%l %z | Log entry time field format. |
INDEXED_EXTRACTIONS | json | Metrics log format. Always in JSON format for metrics logging. |
KV_MODE | none | Key-value data indicator. Setting to none means no key-value data. Always none for metrics logging. |
JSON_TRIM_BRACES_IN_ARRAY_NAMES | true | Whether to trim brace characters from JSON array names. |
Possible error codes
The following status codes have particular meaning for all HTTP Event Collector endpoints:
Status code | HTTP status code ID | HTTP status code | Status message |
---|---|---|---|
0 | 200 | OK | Success
|
1 | 403 | Forbidden | Token disabled
|
2 | 401 | Unauthorized | Token is required
|
3 | 401 | Unauthorized | Invalid authorization
|
4 | 403 | Forbidden | Invalid token
|
5 | 400 | Bad Request | No data
|
6 | 400 | Bad Request | Invalid data format
|
7 | 400 | Bad Request | Incorrect index
|
8 | 500 | Internal Error | Internal server error
|
9 | 503 | Service Unavailable | Server is busy
|
10 | 400 | Bad Request | Data channel is missing
|
11 | 400 | Bad Request | Invalid data channel
|
12 | 400 | Bad Request | Event field is required
|
13 | 400 | Bad Request | Event field cannot be blank
|
14 | 400 | Bad Request | ACK is disabled
|
15 | 400 | Bad Request | Error in handling indexed fields
|
16 | 400 | Bad Request | Query string authorization is not enabled
|
To ensure data is successfully ingested into the Splunk platform, configure your clients with the ability to act on response codes returned by the HEC endpoint. If the client can't take an action based on the resulting response code, data loss might occur.
Investigate instance performance with the Monitoring Console
The Monitoring Console provides pre-built dashboards for HEC that you can use to investigate your instance performance. See the following topics for more information:
- For Splunk Cloud Platform, see the Monitor your Splunk Cloud Platform Deployment chapter in the Splunk Cloud Platform Admin Manual.
- For Splunk Enterprise, see the About the Monitoring Console chapter in the Monitoring Splunk Enterprise manual.
The Monitoring Console provides a pre-built dashboard to monitor HTTP Event Collector. See Indexing: Inputs: HTTP Event Collector in the Monitoring Splunk Enterprise manual.
Detect scaling problems
If you are experiencing performance slowdowns or want to speed up your HTTP Event Collector deployment, the following factors can affect performance.
HTTP and HTTPS
Sending data over HTTP results in a significant performance improvement compared to sending data over HTTPS.
Batching
If you batch multiple events into single requests, it can speed up data transmission. Because the request metadata applies to all events in the request, less data is sent overall. For more information about how event data is packaged, see Format events for HTTP Event Collector.
HTTP Keep-alive
Setting keep-alive on your connection can improve performance. As long as the client sending the data supports HTTP 1.1 and is set up to support HTTP persistent connection, you can optimize performance with keep-alive.
Persistent queues
Persistent queuing slows down performance by storing data in an input queue to disk. For more information, see Use persistent queues to help prevent data loss.
HTTP Event Collector examples | Monitor First In, First Out (FIFO) queues |
This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.0.9, 9.0.10, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.1.4, 9.1.5, 9.1.6, 9.2.0, 9.2.1, 9.2.2, 9.2.3, 9.3.0, 9.3.1, 8.1.0, 8.1.10, 8.1.11, 8.1.12
Feedback submitted, thanks!