Docs » Monitor services and hosts in Splunk Infrastructure Monitoring » Monitor Amazon Web Services » Monitor Amazon Web Services

Monitor Amazon Web Services 🔗

The Infrastructure Monitoring Amazon Web Services (AWS) integration imports metrics and metadata from AWS CloudWatch and the following AWS services, as well as other applications.

Metrics are data points identified by a name, and metadata is information that helps you identify aspects of the metrics such as its source. AWS metrics and metadata help you monitor and troubleshoot the AWS services you’re using. They also help you monitor applications, such as Kubernetes clusters, that use the AWS services.

To learn more about logs and AWS, see Introduction to Splunk Log Observer.

Import AWS CloudWatch data and metadata 🔗

AWS provides a CloudWatch agent that lets you import (or download) metrics, logs, and metadata. To import these metrics in Infrastructure Monitoring, add the namespace you use for the AWS CloudWatch agent as a custom namespace in your AWS integration, as described in the section Specify and limit the data and metadata to import.

During this import, Infrastructure Monitoring gives the metrics special names so you can identify them as coming from AWS:

  • AWS metadata becomes dimensions and custom properties.

  • AWS tags are key-value pairs, so Infrastructure Monitoring converts them to custom properties.

To learn more, see Metadata available per service, or refer to the AWS documentation site.

CloudWatch rollups and Infrastructure Monitoring MTS 🔗

AWS CloudWatch uses rollups to summarize metrics, and it refers to them as “statistics”. To learn more about rollups, see Rollups in data resolution and rollups in charts.

Because AWS CloudWatch rollups don’t map directly to Infrastructure Monitoring rollups, you can’t directly access AWS CloudWatch rollups using the rollup selection menu in the Chart Builder. Instead, Infrastructure Monitoring captures the rollups as individual MTS that have the dimension stat.

AWS statistic

IM dimension

Definition

Average

stat:mean

Mean value of metric over the sampling period

Maximum

stat:upper

Maximum value of metric over the sampling period

Minimum

stat:lower

Minimum value of metric over the sampling period

Data Samples

stat:count

Number of samples over the sampling period

Sum

stat:sum

Sum of all values that occurred over the sampling period

To use an AWS CloudWatch metric in a plot, always specify the following:

  • AWS Cloudwatch metric name

  • Filter for the stat dimension value that’s appropriate for the metric you’ve chosen.

For example, if you are using the metric NetworkPacketsIn for EC2 metrics, the only meaningful AWS statistics are Minimum, Maximum and Average. To plot NetworkPacketsIn metric with the rollup you want, filter for the stat dimension with a value that corresponds to the AWS statistic (rollup) value:

  • lower: Rollup that corresponds to the AWS rollup Minimum

  • upper: Rollup that corresponds to the AWS rollup Maximum

  • mean: Rollup that corresponds to the AWS rollup Average

Note

The “Rollup: Multiple” label in a plot for a CloudWatch metric indicates that you haven’t specified the rollup you want. To avoid confusion, specify the rollup as soon as possible.

Infrastructure Monitoring uses a 60-second sampling period for metrics it imports from AWS.

To learn more, see the AWS developer documentation for AWS CloudWatch.

Import data and metadata from other applications 🔗

Infrastructure Monitoring also imports metrics, metadata, and logs for some of your applications that use AWS services. The following table lists these applications.

Get data in

Monitor

Description

Collect Kubernetes data

Monitor Kubernetes (classic version)

Import metrics and logs from Kubernetes clusters running in EC2 instances or EKS.

Monitor hosts

Import metrics and logs from Linux and Windows hosts running in EC2 instances.

Instrument back-end applications to send spans to Splunk APM

Introduction to Splunk APM

Import application metrics and spans running in hosts, Kubernetes clusters, or Lambda functions.

Specify and limit the data and metadata to import 🔗

By default, Observability Cloud imports metrics from all built-in AWS namespaces, corresponding to these AWS services. Optionally, you can add custom namespaces.

To limit the amount of AWS data to import, reduce the number of namespaces to pull data from.

  • Specify a subset of built-in namespaces to import data from. On the UI, go to Select built-in services to collect data from, then choose the specific namespaces you want to work with. You can specify multiple built-in services.

  • Specify the custom namespaces to import data from. On the UI, go to Select custom services to collect data from, type the name of the custom namespace, then press Enter. Using this procedure, you can specify multiple custom namespaces. Note that data from built-in services is imported as well.

  • To discard data from built-in namespaces and only import metrics from custom namespaces, use the field syncCustomNamespacesOnly via the API. See how to do this in our developer portal .

You can also limit the amount of AWS data that the integration imports by changing the rate at which Infrastructure Monitoring polls AWS CloudWatch.

Next, you can specify filters to limit the data you want to import:

  • For built-in services for which we sync metadata, you can filter the data based on AWS tags, metric names, or both. Filters don’t affect tag syncing.

  • For services without metadata (including custom namespaces), you can only filter by metric names.

Note

You must be an administrator of your AWS account to specify namespaces and set filters.

Example: Specify namespaces and filters 🔗

The following example demonstrates how to specify the following:

  • Namespace: Only import data from Amazon ElasticSearch Service and EC2.

  • Data filters: Only import data from EC2 if it matches a filter.

  • Tag filters: Exclude data from resources that have the AWS tag version:canary.

To create these specifications, follow these steps:

  1. From the list of namespaces, select Amazon ElasticSearch Service and EC2.

  2. To limit the data Infrastructure Monitoring imports from EC2, select data filters from the list.

  3. To select the filters you want from the following options:

    • Use Import some if you want a filter that only imports data.

    • Use Exclude some if you want a filter that only excludes data.

  4. To use AWS tags to limit the data Infrastructure Monitoring imports, filter by tag. For this example, specify a filter that excludes data from resources that have the AWS tag version:canary.

Infrastructure Monitoring adds the prefix aws_tag_ to the names of tags imported from AWS, which indicates their origin. For example, the AWS tag version:canary appears in Infrastructure Monitoring as aws_tag_version:canary. When you filter an AWS integration by tag, enter the name of the tag as it appears in AWS.

You can also choose specific metrics to include or exclude. For example, consider the following conditions.

Infrastructure Monitoring only includes metricA and metricB, and only for resources specified by the tags:

  • For a resource that has the tag env:prod or env:beta, metricA and metricB are included.

  • For a resource that doesn’t have the tags env:prod or env:beta, no metrics are included.

  • No other metrics are included.

Infrastructure Monitoring supports wildcards in filters. For example, if you want to import data for a resource that has specific tags, regardless of the tag values, specify this filter:

In this example, metricA and metricB are included for resources that have the env tag set to any value. No other metrics are included.

When you remove a namespace, Infrastructure Monitoring no longer includes metrics from that namespace.

Note

You can specify more complex filtering options for a namespace by using the Infrastructure Monitoring API. In this case, the UI displays a message indicating that the filter is defined programmatically. To see which metrics and tags are included or excluded for that namespace, click View filter code.

Example: Filter AWS data using tags 🔗

You can filter AWS data using AWS tags, only if Observability Cloud syncs tags for those AWS namespaces. For example, if you use Detailed Monitoring for EC2 instances in AWS, Infrastructure Monitoring imports the following dimensions:

  • AutoScalingGroupName

  • ImageId

  • InstanceId

  • InstanceType.

You can use the following AWS metadata to filter metrics:

Custom Property

Form

Description

aws_account_id

key-value pair

AWS account ID for the instance, volume or load balancer. Use this property to differentiate between metrics you import.

aws_tag_<TAGNAME>

key and optional value

AWS custom tag name for the instance, volume or load balancer. A metric may have more than one associated custom tag name.

Use aws_account_id to differentiate between metrics you import from multiple AWS accounts. Infrastructure Monitoring adds aws_account_id as a dimension of the MTS for the metric.

For supported AWS services, Infrastructure Monitoring imports AWS tags and adds them as custom properties to the MTS for the metric. For example, if AWS tag has the value named Production, it will be shown in Infrastructure Monitoring as aws_tag_Production.

Unsupported characters 🔗

Be careful when choosing tag names: Splunk Observability Cloud only allows alphanumeric characters, and the underscore and minus symbols. Unsupported characters include ., :, /, =, +, @, and spaces, which are replaced by the underscore character.

Monitor AWS services and identify problems 🔗

Visit the Infrastructure page to monitor the health of the AWS services you’re using. It provides a key metric for each service. You can also drill down into specific instances of an AWS service. For example, start by viewing the key metrics for your EC2 service, and then filter for a specific instance ID to analyze the EC2 instance with that ID.

Follow these steps to find and troubleshoot AWS services from the Infrastructure page:

  1. Select Navigation menu > Infrastructure, then click Amazon Web Services category.

  2. Select the specific service you want to analyze. For example, click EBS to view information about your storage volumes. If you see the message No Data Found, you first need to configure the integration for the service.

  3. Compare instances of the services to investigate their relative health. Select a metric from the Color by drop-down list. In the heat map, colors indicate the health of each instance based on the selected metric. For example, consider an AWS EBS heat map for the total number of I/O operations in a time period (Total IOPS). The heat map displays high Total IOPS in lighter colors, which indicates that the instances are healthy. In comparison, the heat map displays low IOPS in a darker color, which indicates that the instances have a I/O-related problem.

    If the heat map only uses green and red, then green indicates a healthy instance and red indicates a problem.

    To apply visually-accessible color palettes to heat maps, select <USER-ID> > App Preferences, then select your desired color accessibility from the Color Accessibility menu.

  4. Investigate correlations between instances and their health by grouping the instances based on a dimension, custom property, or tag. To group instances, select the metadata name from the Group by drop-down list.

    Note

    In the DynamoDB navigator, when you view the heatmap and group the instances by aws_account_id, some entries might report back as “n/a” because properties are omitted when the query is not specific enough. To work around this issue, filter by Operation, then group by aws_account_id.

  5. Outliers are another indication of instance health. An outlier is a metric value that is significantly outside the mean or median value of all other metric values. To find the outliers in metrics coming from AWS services, use the Find Outliers setting and specify the Scope and Strategy:

    You can select one of two Strategies to find outliers, as described in the following table.

    Strategy

    Description

    Deviation from Mean

    Instances shown in red are ones that exceed the mean value of the metric by at least three standard deviations.

    Deviation from Median

    Instances shown in red are ones that exceed the median absolute deviation value by at least three absolute deviations. Deviation from Median This setting does not weigh extreme outliers as heavily as the standard deviation.

  6. To drill down to a specific instance you want to investigate, hover over the heatmap to find the specific instance ID, then click the cell to see the information for that ID. For every instance, Infrastructure Monitoring provides a default dashboard.

The default dashboard helps you analyze all the available metadata about the cloud service the instance is running in, the instance itself, and any custom tags associated with the instance. The default dashboard provides metric time series (MTS) for key metrics.

Use default dashboards to monitor AWS services 🔗

Observability Cloud provides default dashboards for supported AWS services. Default dashboards are available in dashboard groups based on the AWS service a dashboard represents data for.

To find default dashboards for AWS services, select Navigation menu > Dashboards and search for the AWS service you want to view dashboards for.

Explore built-in content 🔗

To see all of the navigators provided for data collected in your organization, go to the Infrastructure page. To see all the pre-built dashboards for data collected in your organization, select Dashboards > Built-in.

Amazon EC2 instances are powered by their respective public cloud service as well as the Splunk Distribution of OpenTelemetry Collector. You need both for all the charts to display data in the built-in dashboards.

  • If you have only the public cloud service and the Smart Agent configured, some charts in the built-in dashboards for Amazon EC2 instances display no data.

  • If you have only the public cloud service configured, you can see all the cards representing the services where data come from, but some charts in the built-in dashboards for Amazon EC2 instances display no data.

  • If you have only Smart Agent configured, Amazon EC2 instance navigator isn’t available.

Costs for AWS monitoring 🔗

Splunk Observability Cloud costs 🔗

Your subscription plan determines how you’ll be charged for sending AWS metrics to Observability Cloud. See more in Infrastructure Monitoring subscription usage (Host and metric plans).

  • In MTS-based subscription plans, all metrics are custom, and you’re therefore charged for them.

  • In host-based subscription plans, most AWS metrics are categorized as bundled, and are part of your plan.

Bundled metrics include all metrics from supported namespaces as well as metrics from the following services:
  • CWAgent

  • Glue

  • MediaLive

  • System/Linux

  • WAF

For a complete list of Observability Cloud metrics, see Metric categories.

AWS costs 🔗

Observability Cloud retrieves AWS metrics with two methods:

  1. Streaming data with Metric Streams.

  2. Polling CloudWatch APIs:

    • First, the list of metrics is retrieved with ListMetrics.

    • Next, data points are fetched with GetMetricData. Note that the GetMetricStatistics API is deprecated, see more in GetMetricStatistics API deprecation notice.

Learn more at Connect AWS to Splunk Observability Cloud.

AWS pricing 🔗

AWS pricing is based on the amount of requested metrics, not the number of requests. Therefore the cost of obtaining Cloudwatch metrics for a service is based on three factors: frequency of pulling data, number of metrics for a given service, and number of cloud resources.

Generally speaking, Metric Streams costs the same as polling if the integration is synced every 5 minutes, and is cheaper (up to 5 times) when synced every minute.

However, when using Metric Stream you can’t control costs, while you can configure the polling frequency of the APIs. See how to limit the metrics to collect, the resources, or the collection frequently.

Example: Cost scenarios using polling APIs 🔗

Let’s imagine a user with the following configuration:

  • 100,000 SQS queues

  • 9 available CloudWatch metrics per queue

First, you need to retrieve your list of metrics using the ListMetrics API at a cost of USD 0.01 per 1,000 API calls:

Scenario

Number of API calls per day

Cost/day

Metrics are listed every 15 minutes, and a list contains up to 500 items

1440 (number of minutes in a day)/15 (pull interval) * 100k / 500 (items) = 19200

USD 0.192

Next, you retrieve the data using the GetMetricData API at a cost of USD 0.01 per 1,000 metrics requested:

Scenario

Number of requested metrics per day

Cost/day

The user wants to retrieve all metrics every 1 minute

1440 (number of minutes in a day) * 9 (number of metrics) * 100k (number of SQS resources) = 1.296B

USD 12,960

The user wants to retrieve all metrics every 5 minutes

1440 (number of minutes in a day)/5 (pull interval) * 9 (number of metrics) * 100k (number of SQS resources) = 259.2M

USD 2,592

The user wants to retrieve ONLY 4 metrics for a 1,000 queues (because they’re the production instances) every 10 minutes

1440 (number of minutes in a day)/10 (pull interval) * 4 (number of metrics) * 1000 (number of SQS resources) = 576k

USD 5.76