KPI reference for the Content Pack for Splunk Observability Cloud

The Content Pack for Splunk Observability Cloud contains Applications Performance Monitoring services, Infrastructure Monitoring services, Real User Monitoring, and Synthetic Checks services. For each service, there are KPIs used to help you monitor the health of your applications. All parent and child services report up to the overall Splunk Observability Cloud service at the top level.

Applications Performance Monitoring services and KPIs

There are 3 Application Performance Monitoring services. This includes a service for Application Error Rate, Application Rate (Throughput), and Application Duration. Each of these roll up to a top-level Application Performance Monitoring service. For each of these services, there are KPIs used to help you monitor the health of your applications.

Service	KPI	Description
Application Duration	SplunkAPM Duration	Shows the combined application response time (duration) metrics from all ingested APM vendor metrics. Thresholds: Low = 0-10 ms, Medium = 10-15 ms, High > 15 ms
Application Error Rate	Splunk APM Error Rate	Shows the combined percentage of requests with errors over total requests for the application from all Splunk APM Services reported metrics. Thresholds: Normal <= 0.01 < Low < 25 < Medium < 50 < High < 90 < Critical
Application Rate (Throughput)	SplunkAPM Rate (Throughput)	Shows the combined application requests per second (Rate) from all ingested Splunk APM service metrics. Thresholds: No predefined threshold. For steps to configure KPI thresholds, see Configure KPI thresholds in ITSI in the Service Insights manual.

Infrastructure Monitoring services and KPIs

There are 13 Infrastructure Monitoring services. This includes a service per entity type – OS Hosts, GCP Compute Engine, GCP Cloud Functions, Azure Functions. Each of these roll up to a corresponding parent service, which roll up to a top-level Infrastructure Monitoring service. For each of these services, there are KPIs used to help you monitor the health of your applications.

AWS services and KPIs

Service	KPI	Description
AWS EC2	AWS EC2 CPU Ultilization	The percentage of allocated EC2 compute units that are currently in use on the instance. Thresholds: High = 90, Medium = 80, Low = 50
	AWS EC2 Disk Read Bytes	The number of bytes read from all instance store volumes available to the instance.
	AWS EC2 Disk Read Operations	The completed read operations from all instance store volumes available to the instance in a specified period of time.
	AWS EC2 Disk Write Bytes	The completed write operations to all instance store volumes available to the instance in a specified period of time.
	AWS EC2 Disk Write Operations	The completed write operations to all instance store volumes available to the instance in a specified period of time.
	AWS EC2 Network In	The number of bytes received by the instance on all network interfaces. This metric identifies the volume of incoming network traffic to a single instance. Thresholds: High = 0, Normal = 10
	AWS EC2 Network Out	The number of bytes sent out by the instance on all network interfaces. This metric identifies the volume of outgoing network traffic from a single instance. Thresholds: High = 0, Normal = 10
	AWS EC2 Network Packets In	The number of packets received by the instance on all network interfaces. This metric identifies the volume of incoming traffic in terms of the number of packets on a single instance. Thresholds: High = 0, Normal = 10
	AWS EC2 Network Packets Out	The number of packets sent out by the instance on all network interfaces. This metric identifies the volume of outgoing traffic in terms of the number of packets on a single instance. Thresholds: High = 0, Normal = 10
AWS Lambda	AWS Lambda Duration	The amount of time that your function code spends processing an event. The billed duration for an invocation is the value of duration rounded up to the nearest millisecond. Thresholds: High = 800,000, Medium = 400,000, Low = 200,000
	AWS Lambda Errors	The number of invocations that result in a function error. Function errors include exceptions thrown by your code and exceptions thrown by the Lambda runtime. Thresholds: High = 1, Normal = 0
	AWS Lambda Invocations	The number of times your function code is executed, including successful executions and executions that result in a function error. Invocations aren't recorded if the invocation request is throttled or otherwise resulted in an invocation error.
	AWS Lambda Throttles	The number of invocation requests that are throttled. When all function instances are processing requests and no concurrency is available to scale up, Lambda rejects additional requests with TooManyRequestsException. Throttled requests and other invocation errors don't

Azure services and KPIs

Service	KPI	Description
Azure Functions	Azure Functions 5xx Errors	The count of 5xx errors from the Azure Function handler. Thresholds: High = 1, Normal = 0
	Azure Functions Bytes Received	The rate at which the function process is receiving bytes to I/O operations. Thresholds: High = 0, Normal = 100
	Azure Functions Bytes Sent	The rate at which the function process is sending bytes to I/O operations. Thresholds: High = 0, Normal = 100
	Azure Functions Executions	The number of times your function has executed. This value correlates to the number of times a function runs in your app.
	Azure Functions Memory	The current amount of memory used by an Azure function.
	Azure Functions Usage Cost	The function cost in terms of function execution units which are a combination of execution time and your memory usage.
Azure VM	Azure VM CPU Percentage	The percentage of allocated compute units that are currently in use by the virtual machine. Thresholds: High = 90, Medium = 80, Low = 50
	Azure VM Disk Read Bytes	The bytes read from disk during monitoring period.
	Azure VM Disk Write Bytes	The bytes written to disk during monitoring period.
	Azure VM Network In	The number of bytes received on all network interfaces by the virtual machine. Thresholds: High = 0, Normal = 10
	Azure VM Network Out	The number of bytes out on all network interfaces by the virtual machine. Thresholds: High = 0, Normal = 10

GCP services and KPIs

Service	KPI	Description
GCP Cloud Functions	GCP Cloud Functions Active	The number of active function instances.
	GCP Cloud Functions Execution Time	The distribution of functions execution times in nanoseconds. Thresholds: High = 800,000, Medium = 400,000, Low = 200,000
	GCP Cloud Functions Executions	The count of function executions.
	GCP Cloud Functions Memory	The distribution of maximum functions' memory usage during execution, in bytes.
	GCP Cloud Functions Network Egress	The outgoing network traffic of function, in bytes.
GCP Compute Engine	GCP Compute Engine CPU Utilization	The fractional utilization of allocated CPU on this instance. Thresholds: High = 90, Medium = 80, Low = 50
	GCP Compute Engine Network Bytes In	The count of bytes received from the network. Thresholds: High = 0, Normal = 10
	GCP Compute Engine Network Bytes Out	The count of bytes sent over the network. Thresholds: High = 0, Normal = 10
	GCP Compute Engine Disk Read Bytes	The count of bytes read on disk.
	GCP Compute Engine Disk Write Bytes	The count of bytes written on disk.

My Data Center Hosts services and KPIs

Service	KPI	Description
OS Hosts	CPU Utilization	CPU Utilization by % Thresholds: High = 90, Medium = 80, Low = 50
	Memory Free	Memory free in megabytes.
	Disk Write IOps	Number of write IOps by a given instance.
	Disk Read IOps	Number of read IOps by a given instance.
	Network Rx	Network octets received.
	Network Tx	Network octets sent.
Kubernetes Pods	CPU Utilization	Pod CPU Utilization by % Thresholds:High = 90, Medium = 80, Low = 50
	Memory Utilization	Pod Memory Utilization by %
	Network Rx Errors	Pod Network Receive Errors
	Network Tx Errors	Pod Network Transmit Errors
Docker Containers	CPU Utilization	Thresholds:High = 90, Medium = 80, Low = 50
	Disk Read	Disk Read
	Disk Write	Disk Write
	Memory Free	Memory free in bytes

Splunk RUM services and KPIs

There are 6 Splunk Real User Monitoring (RUM)-related services. This includes two services each for RUM Browser, RUM Mobile, and RUM Synthetics.

Service	KPI	Description
RUM Browser Metrics	Front-end requests	Front-end requests by browser.
	Front-end errors	Front-end errors by browser.
	Document load latency (P75)	Document load latency in ms by browser (P75).
	Endpoint Requests	Browser Endpoint Requests by browser.
	Endpoint Latency (P75)	Browser Endpoint Latency in ms by browser (P75).
RUM Browser Web Vitals	Largest Contentful Paint in ms (P75)	Largest Contentful Paint in ms by browser (P75).
	Cumulative Layout Shift (P75)	Cumulative Layout Shift by browser (P75).
	First Input Delay in ms (P75)	First Input Delay in ms by browser (P75).
RUM Mobile Metrics	Front-end requests	Front-end requests by app
	Document load latency (P75)	Document load latency in ms by app (P75).
	Endpoint Requests	Mobile Endpoint Requests by App.
	Endpoint Latency (P75)	Mobile Endpoint Latency in ms by App (P75).
RUM Mobile App Metrics	App crash count	Count of app crashes by app
	App error	Count of app errors by app
	App Cold Start in ms (P75)	App Cold Start times in ms (P75).
	App Cold Starts	Count of app cold starts.
RUM Synthetic Metrics	Front-end requests	Front-end requests by Splunk Synthetics.
	Front-end errors	Splunk Synthetics Front-end errors by browser.
	Document load latency (P75)	Splunk Synthetics Document load latency in ms by browser (P75).
	Endpoint Requests	Splunk Synthetics Endpoint Requests by browser.
	Endpoint Latency (P75)	Splunk Synthetics Endpoint Latency in ms by browser (P75).
RUM Synthetic Web Vitals	Largest Contentful Paint in ms (P75)	Splunk Synthetics Largest Contentful Paint in ms by browser (P75).
	Cumulative Layout Shift (P75)	Splunk Synthetics Cumulative Layout Shift by browser (P75).
	First Input Delay in ms (P75)	Splunk Synthetics First Input Delay in ms by browser (P75)

Synthetic Tests services and KPIs

There are 3 Synthetic Tests services. This includes a service per entity type or synthetic test, as well as a top-level synthetic test. For each of these services, there are KPIs used to help you monitor the health of your web applications.

Note that these KPIs don't have predefined thresholds configured. For steps to configure KPI thresholds, see Configure KPI thresholds in ITSI in the Service Insights manual.

Service	KPI	Description
API Tests	Connect Time	How long it took the API test to respond to the initial connection.
	DNS Time	How long it took DNS to respond to the API request.
	Duration Time	The total duration of the API request.
	Receive Time	How long it took to receive the response from the API endpoint.
HTTP Tests	DNS Time	How long it took DNS to respond to the request for the URL.
HTTP Tests	Duration Time	Total time it took to respond to the URL.
Browser Tests	DOM Complete Time	The total time for completely building the Document Object Model.
	DOM Interactive Time	The time until the Document Object Model is interactive for the user.
	DOM Load Time	The time until the Document Object Model has been loaded.
	Duration Time	The total time it takes to load the given browser test.
	First CPU Idle Time	The time when the browser CPU becomes idle after loading.
	First Contentful Paint Time	The time when the first web site visual content from the Document Object Model has loaded.
	First Meaningful Paint Time	The time it takes for the first part of a web page's readable or viewable content to be rendered for the visitor.
	First Paint Time	The interval between navigation to a web page and when the browser first renders pixels from that page to the screen.
	First Request Connect Time	The time between a user opening a web page and connecting to the server serving that page.
	First Request DNS Time	The time between a user opening a web page and DNS resolution.
	First Request Receive Time	The amount of time it takes for a web browser to receive its first byte of data after a DNS lookup has completed.
	First Request Send Time	The amount of time it takes for a web browser to send the initial request to a web server.
	First Request TLS Time	The amount of time it takes for a web browser to establish a secure connection with a web server using the Transport Layer Security (TLS) protocol.
	First Request Wait Time	The amount of time it takes for a web browser to receive the first bytes of data from a web server after the DNS lookup has completed and the TLS handshake has been established.
	Fully Loaded Time	Time until there is 1.5 seconds of network inactivity after onload, waiting up to a maximum of 5 seconds.
	Lighthouse Score	Scoring from Google Chrome Lighthouse tool. For information about that tool, see Chrome Lighthouse developer documentation.
	Onload Time	Time between a user opening a web page and that page being fully loaded.

KPI reference for the Content Pack for Splunk Observability Cloud

Applications Performance Monitoring services and KPIs

Infrastructure Monitoring services and KPIs

AWS services and KPIs

Azure services and KPIs

GCP services and KPIs

My Data Center Hosts services and KPIs

Splunk RUM services and KPIs

Synthetic Tests services and KPIs

Comments

KPI reference for the Content Pack for Splunk Observability Cloud

Was this topic useful?