Troubleshoot HTTP Event Collector

You can troubleshoot HTTP Event Collector (HEC) by viewing error logs. You can also set up logging using configuration files, investigate instance performance with dashboards included in the Monitoring Console, and detect other scaling problems.

Logging

HTTP Event Collector saves usage data about itself to log files. You can search these usage metrics using Splunk Cloud Platform or Splunk Enterprise to explore usage trends system-wide, per token, per source type, and more, as well as to evaluate HEC performance. Metrics are logged whenever HEC is active. HEC is disabled by default, so it does not log data until you enable it.

You can also view HEC error logs in the splunkd.log log file on Splunk Enterprise. See Enable debug logging in the Troubleshooting Manual for how to enable debugging on your Splunk Enterprise instance.

Log file location and management

Splunk Enterprise writes HTTP Event Collector metrics to the $SPLUNK_HOME/var/log/introspection/http_event_collector_metrics.log file.

The Splunk platform creates a new http_event_collector_metrics.log file when you log off of and back on to Splunk Cloud Platform or start your Splunk Enterprise instance. Any existing file with that name is renamed.

You configure the logging frequency of HTTP Event Collector metrics in the limits.conf configuration file. 60 seconds is the default frequency. HEC continues logging system-level metrics even when there is no data input activity. When there is no activity, you can expect about 200 kilobytes (KB) of metrics log data to be produced every 24 hours. The maximum size of a metrics log file is 25 megabytes (MB). If a log file reaches that limit, the Splunk platform renames the log file and creates a new file. Up to five metrics log files can be stored at a time.

The props.conf configuration file defines parameters for reading and indexing the metrics log file.

Searching HTTP Event Collector metrics data

The Splunk platform puts HEC metrics data into the _introspection index. To search the accumulated HEC metrics with the Splunk platform, use the following search command:

index="_introspection" token

Metrics log data format

The Splunk platform records HEC metrics data to the log in JSON format. This means that the log is both human-readable and consistent with other Splunk Cloud Platform or Splunk Enterprise log formats. A single entry consists of both input summary metrics (series = http_event_collector) and per-token metrics (series = http_event_collector_token), as shown in the following example:

{  
   "datetime":"09-01-2016 19:21:19.014 -0700",
   "log_level":"INFO",
   "component":"HttpEventCollector",
   "data":{  
      "series":"http_event_collector",
      "transport":"http",
      "format":"json",
      "total_bytes_received":0,
      "total_bytes_indexed":0,
      "num_of_requests":0,
      "num_of_events":0,
      "num_of_errors":0,
      "num_of_parser_errors":0,
      "num_of_auth_failures":0,
      "num_of_requests_to_disabled_token":0,
      "num_of_requests_to_incorrect_url":0,
      "num_of_requests_in_mint_format":0,
      "num_of_ack_requests":0,
      "num_of_requests_acked":0,
      "num_of_requests_waiting_ack":0
   }
}

{  
   "datetime":"08-22-2016 12:38:04.854 -0700",
   "log_level":"INFO",
   "component":"HttpEventCollector",
   "data":{  
      "token_name":"test",
      "series":"http_event_collector_token",
      "transport":"http",
      "format":"json",
      "total_bytes_received":57000,
      "total_bytes_indexed":44000,
      "num_of_requests":1000,
      "num_of_events":1000,
      "num_of_errors":0,
      "num_of_parser_errors":0,
      "num_of_requests_to_disabled_token":0,
      "num_of_requests_in_mint_format":0
   }
}

HEC summary metrics

The Splunk platform accumulates system-wide summary metrics even if there is no input activity. These metrics are identified by "series":"http_event_collector".

See the following table for a description of the fields for HEC summary metrics:

Field	Description	Value
component	HTTP Event Collector metrics data identifier.	HttpEventCollector
data:format	HTTP Event Collector data format.	json
data:num_of_auth_failures	Total number of authentication failures due to invalid token.	unsigned integer
data:num_of_errors	Total number of per-token errors, which include the following options: Bad data format No authorization Bad authorization Connectivity problems	unsigned integer
data:num_of_events	Total number of per-token events received by the HTTP Event Collector endpoint.	unsigned integer
data:num_of_parser_errors	Total number of per-token parser errors due to incorrectly formatted event data.	unsigned integer
data:num_of_requests	Total number of valid per-token individual HTTP or HTTPS requests received by an HTTP Event Collector endpoint. Each request can have one or more data events.	unsigned integer
data:num_of_ack_requests	Total number of HEC request indexer status queries received.	unsigned integer
data:num_of_requests_acked	Total number of HEC requests that Splunk successfully indexed and acknowledged.	unsigned integer
data:num_of_requests_waiting_ack	Total number of HEC requests received with indexer acknowledgements enabled.	unsigned integer
data:num_of_requests_to_incorrect_url	Total number of requests to an incorrect URL.	unsigned integer
data:num_of_requests_in_mint_format	Total number of requests from Splunk MINT.	unsigned integer
data:num_of_requests_to_disabled_token	Total number of per-token requests to disable token.	unsigned integer
data:series	Metrics data type.	http_event_collector
data:total_bytes_indexed	Total amount of per-token data sent to the indexer.	unsigned integer
data:total_bytes_received	Total amount of per-token data received by calling the `receive/token` endpoint.	unsigned integer
data:transport	Data transport protocol for HTTP Event Collector data.	http
datetime	Date and time associated with the data. Takes the following format: `MM-DD-YYYY HH:MM:SS.SSS +/-GMTDELTA`	string
log_level	Log severity level.	INFO

Per-token metrics

In contrast to the system-wide summary metrics, the Splunk platform accumulates per-token metrics only when HEC is active. These metrics are identified by "series":"http_event_collector_token".

The [http_input] stanza in the limits.conf configuration file defines the logging interval and maximum number of tokens logged for these metrics.

See the following table for a description of the fields for per-token metrics:

Field	Description	Value
component	HTTP Event Collector metrics data identifier.	HttpEventCollector
data:format	HTTP Event Collector data format. Always JSON format for metrics logging.	json
data:num_of_errors	Number of errors, which include the following: Bad data format No authorization Bad authorization Connectivity problems	unsigned integer
data:num_of_events	Number of events received by the HTTP Event Collector endpoint.	unsigned integer
data:num_of_parser_errors	Number of parser errors due to incorrectly formatted event data.	unsigned integer
data:num_of_requests	Number of valid individual HTTP or HTTPS requests received by an HTTP Event Collector endpoint. Each request can have one or more data events.	unsigned integer
data:num_of_requests_in_mint_format	Total number of requests from Splunk MINT.	unsigned integer
data:num_of_requests_to_disabled_token	Number of requests to a disabled token.	unsigned integer
data:series	Metrics data type.	http_event_collector_token
data:token_name	Token name.	string
data:total_bytes_indexed	Total amount of data sent to the indexer.	unsigned integer
data:total_bytes_received	Total amount of data received by calling the `receive/token` endpoint.	unsigned integer
data:transport	Data transport protocol for HTTP Event Collector data.	http
datetime	Date and time associated with the data. Takes the following format: `MM-DD-YYYY HH:MM:SS.SSS +/-GMTDELTA`	string
log_level	Log severity level.	INFO

Logging with configuration files

The limits.conf and props.conf files control metrics data logging and indexing behavior.

limits.conf

The [http_input] stanza in the $SPLUNK_HOME/etc/system/default/limits.conf file controls HTTP Event Collector metrics data logging.

For information about all HTTP Event Collector-related parameters, including those not related to metrics, see the [http_input] stanza documentation on limits.conf in the Splunk Enterprise Admin Manual.

Limits.conf takes the following parameters:

Parameter	Default value	Description
max_number_of_tokens	10000	An unsigned integer that represents the maximum number of tokens reported by HTTP Event Collector metrics.
metrics_report_interval	60	An unsigned integer that represents the number of seconds in an HTTP Event Collector metrics report interval.

props.conf

The [http_event_collector_metrics] stanza in the $SPLUNK_HOME/etc/system/default/props.conf file controls reading and indexing the HTTP Event Collector log files.

See the following example:

[source::.../http_event_collector_metrics.log(.\d+)?]
sourcetype = http_event_collector_metrics

...

[http_event_collector_metrics]
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = datetime
TIME_FORMAT = %m-%d-%Y %H:%M:%S.%l %z
INDEXED_EXTRACTIONS = json
KV_MODE = none
JSON_TRIM_BRACES_IN_ARRAY_NAMES = true

Props.conf takes the following parameters:

Parameter	Default	Description
SHOULD_LINEMERGE	false	Specifies layout of events per line. Setting to true allows multiple events in the same line. Setting to false puts multiple events in separate lines.
TIMESTAMP_FIELDS	datetime	Log entry time field name.
TIME_FORMAT	%m-%d-%Y %H:%M:%S.%l %z	Log entry time field format.
INDEXED_EXTRACTIONS	json	Metrics log format. Always in JSON format for metrics logging.
KV_MODE	none	Key-value data indicator. Setting to none means no key-value data. Always none for metrics logging.
JSON_TRIM_BRACES_IN_ARRAY_NAMES	true	Whether to trim brace characters from JSON array names.

Possible error codes

The following status codes have particular meaning for all HTTP Event Collector endpoints:

Status code	HTTP status code ID	HTTP status code	Status message
0	200	OK	`Success`
1	403	Forbidden	`Token disabled`
2	401	Unauthorized	`Token is required`
3	401	Unauthorized	`Invalid authorization`
4	403	Forbidden	`Invalid token`
5	400	Bad Request	`No data`
6	400	Bad Request	`Invalid data format`
7	400	Bad Request	`Incorrect index`
8	500	Internal Error	`Internal server error`
9	503	Service Unavailable	`Server is busy`
10	400	Bad Request	`Data channel is missing`
11	400	Bad Request	`Invalid data channel`
12	400	Bad Request	`Event field is required`
13	400	Bad Request	`Event field cannot be blank`
14	400	Bad Request	`ACK is disabled`
15	400	Bad Request	`Error in handling indexed fields`
16	400	Bad Request	`Query string authorization is not enabled`
17	200	OK	`HEC is healthy`
18	503	Service Unavailable	`HEC is unhealthy, queues are full`
19	503	Service Unavailable	`HEC is unhealthy, ack service unavailable`
20	503	Service Unavailable	`HEC is unhealthy, queues are full, ack service unavailable`
21	400	Bad Request	`Invalid token`
22	400	Bad Request	`Token disabled`
23	503	Service Unavailable	`Server is shutting down`
24	200	OK	`HEC queue is approaching its capacity limit`
25	200	OK	`HEC ACK is approaching its capacity limit`
26	429	Too Many Requests	`HEC queue is at capacity and cannot process any more requests`
27	429	Too Many Requests	`HEC ACK channel is at capacity and cannot process any more requests`

To ensure data is successfully ingested into the Splunk platform, configure your clients with the ability to act on response codes returned by the HEC endpoint. If the client can't take an action based on the resulting response code, data loss might occur.

Investigate instance performance with the Monitoring Console

The Monitoring Console provides pre-built dashboards for HEC that you can use to investigate your instance performance. See the following topics for more information:

For Splunk Cloud Platform, see the Monitor your Splunk Cloud Platform Deployment chapter in the Splunk Cloud Platform Admin Manual.
For Splunk Enterprise, see the About the Monitoring Console chapter in the Monitoring Splunk Enterprise manual.

The Monitoring Console provides a pre-built dashboard to monitor HTTP Event Collector. See Indexing: Inputs: HTTP Event Collector in the Monitoring Splunk Enterprise manual.

Detect scaling problems

If you are experiencing performance slowdowns or want to speed up your HTTP Event Collector deployment, the following factors can affect performance.

HTTP and HTTPS

Sending data over HTTP results in a significant performance improvement compared to sending data over HTTPS.

Batching

If you batch multiple events into single requests, it can speed up data transmission. Because the request metadata applies to all events in the request, less data is sent overall. For more information about how event data is packaged, see Format events for HTTP Event Collector.

HTTP Keep-alive

Setting keep-alive on your connection can improve performance. As long as the client sending the data supports HTTP 1.1 and is set up to support HTTP persistent connection, you can optimize performance with keep-alive.

Persistent queues

Persistent queuing slows down performance by storing data in an input queue to disk. For more information, see Use persistent queues to help prevent data loss.

Troubleshoot HTTP Event Collector

Logging

Log file location and management

Searching HTTP Event Collector metrics data

Metrics log data format

HEC summary metrics

Per-token metrics

Logging with configuration files

limits.conf

props.conf

Possible error codes

Investigate instance performance with the Monitoring Console

Detect scaling problems

HTTP and HTTPS

Batching

HTTP Keep-alive

Persistent queues

Comments

Troubleshoot HTTP Event Collector

Was this topic useful?