Onboarding part 2: Design your architecture and get data in 🔗

After completing Onboarding part 1: Configure your user and team administration, you are ready for the second part of the onboarding phase. In this part of the onboarding phase, you get familiar with important concepts, gather requirements, and begin integrating Splunk Observability Cloud into your existing environment. To design your architecture and get data in, complete the following tasks:

Get familiar with OpenTelemetry concepts
Create an architecture prototype
Analyze your required network communication
Analyze how to collect metrics from cloud providers
Configure and implement host and Kubernetes metrics
Collect data from third-party metrics providers
Bring data in for use in Splunk APM
Set up Log Observer Connect for the Splunk Platform
Review the default dashboards and detectors

Note

Work closely with your Splunk Sales Engineer or Splunk Customer Success Manager throughout your onboarding process. They can help you fine tune your Splunk Observability Cloud journey and provide best practices, training, and workshop advice.

Get familiar with OpenTelemetry concepts 🔗

Spend some time to understand the concepts of the OpenTelemetry Collector. Pay special attention to configuring receivers, processors, exporters, and connectors since most OpenTelemetry configurations have each of these pipeline components.

See https://opentelemetry.io/docs/concepts/.

Create an architecture prototype 🔗

Create a prototype architecture solution for Splunk Observability Cloud in your organization. Complete the following tasks to create a prototype:

Get familiar with setting up and connecting applications to Splunk Observability Cloud. Set up an initial OpenTelemetry Collector on a commonly used platform, such as a virtual machine instance or a Kubernetes cluster.

See Set up Infrastructure Monitoring and Get started with the Splunk Distribution of the OpenTelemetry Collector for more information.
In most cases, you also need to connect Splunk Observability Cloud to your cloud provider. To ingest data from cloud providers, such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP), you need to set up cloud integrations.

See Supported integrations in Splunk Observability Cloud for supported integrations.
Determine the OTel deployment mode you want to use: host (agent) and data forwarding (gateway). Host (agent) mode is the default mode.

See Collector deployment modes for more information.
When deploying OpenTelemetry in a large organization, it’s critical to define a standardized naming convention for tagging and a governance process to ensure the convention is adhered to. Standardized naming also makes it easier to find metrics and identify usage. See Naming conventions for metrics and dimensions and Naming conventions for tagging with OpenTelemetry and Splunk.

There are a few cases where incorrect naming affects in-product usage data:
- If your organization uses host-based Splunk Observability Cloud licensing, your OpenTelemetry naming convention must use the OpenTelemetry host semantic convention to track usage and telemetry correctly. See the OpenTelemetry semantic conventions for hosts.
- You must use the Kubernetes attributes processor for Kubernetes pods to ensure standard naming and accurate usage counting for host-based organizations. See Kubernetes attributes processor.
See Naming conventions for metrics and dimensions.
Select at least 1 application or service to collect metrics from as part of your prototype. This helps you see the corresponding dashboards and detectors created when your metrics are received by Splunk Observability Cloud. For example, you can use OpenTelemetry receivers to include services like an Nginx server, an Apache web server, or a database such as MySQL.

See NGINX, Apache HTTP Server, or MySQL (deprecated).
Get familiar with the Splunk Observability Cloud receivers for various applications and services. Each receiver has corresponding dashboards and detectors that are automatically created for each integration after the receiver reaches over 50,000 data points.

See Supported integrations in Splunk Observability Cloud, Built-in dashboards, and Use and customize AutoDetect alerts and detectors.

Analyze your required network communication 🔗

Analyze your required network communication by determining which ports need to be open, which protocols to use, and proxy considerations.

See Exposed ports and endpoints to determine which ports you need to open in the firewall and what protocols you need to turn on or off in the Collector.
If your organization requires a proxy, see Allow Splunk Observability Cloud services in your network.

Analyze how to collect metrics from cloud providers 🔗

To monitor a cloud-based host, install the Splunk OTel collector on each host to send host metrics to Splunk Observability Cloud. Use the Cloud providers’ filters to refine what data you bring in to Splunk Observability Cloud. You can limit the host metrics you send by excluding specific metrics that you don’t need to monitor from the cloud provider. Excluding metrics from being consumed offers the following advantages:

You can control which host you monitor, instead of all hosts.
You can retrieve advanced metrics without incurring extra cost.
You can send metrics at a higher frequency without incurring extra cost, such as every 10 seconds by default instead of every 5 minutes or more, which is the typical default for cloud providers.

See Connect to your cloud service provider and Collector deployment tools and options.

Configure and implement host and Kubernetes metrics 🔗

The OpenTelemetry Collector automatically reads and detects different types of host or Kubernetes metadata from operating systems or from the cloud providers. See Host metrics receiver or Configure the Collector for Kubernetes with Helm for more information about host or Kubernetes metadata.

The OpenTelemetry Collector adds dimensions, metric tags, and span attributes which are known as tags. The most common metadata entry is the name of the host, which can come from different sources with different names. See Metadata: Dimensions, custom properties, tags, and attributes for details on the metadata the collector adds.

To retrieve and modify your metadata, use the resource detection processor in the pipeline section of the OpenTelemetry Agent Configuration. Before installing the OpenTelemetry Collector on a host, verify that the resource detection module in the configuration file of the OpenTelemetry Collector matches the preferred metadata source. The order determines which sources are used. See Resource detection processor.

Collect data from third-party metrics providers 🔗

When using the Splunk Distribution of OpenTelemetry Collector, you can use receivers to collect metrics data from third-party providers. For example, you can use the Prometheus receiver to scrape metrics data from any application that exposes a Prometheus endpoint. See Prometheus receiver.

See Supported integrations in Splunk Observability Cloud to see a list of receivers.

Bring data in for use in Splunk APM 🔗

Splunk Application Performance (APM) provides end-to-end visibility to help identify issues such as errors and latency across all tags of a service. Splunk APM produces infinite cardinality metrics and full-fidelity traces. Splunk APM also measures request, error, and duration (RED) metrics. See Learn what you can do with Splunk APM.

To familiarize yourself with the key concepts of Splunk APM, see Key concepts in Splunk APM.

Add an auto instrumentation library to a service to send traces to Splunk APM 🔗

To send traces to Splunk APM, you need to deploy an auto instrumentation agent for each programming language or language runtime. To deploy an auto instrumentation agent, see Instrument your applications and services to get spans into Splunk APM.

(Optional) Use the automatic discovery to instrument your applications 🔗

If you are deploying many similar services written in Java, .NET, or Node.js, deploy the OpenTelemetry Collector with automatic discover. Use automatic discovery if you don’t have access to the source code or the ability to change the deployment.

See Discover telemetry sources automatically.

(Optional) Turn on AlwaysOn Profiling to collect stack traces 🔗

Use AlwaysOn Profiling for deeper analysis of the behavior of select applications. Code profiling collects snapshots of the CPU call stacks and memory usage. After you get profiling data into Splunk Observability Cloud, you can explore stack traces directly from APM and visualize the performance and memory allocation of each component using the flame graph.

Use this profiling data to gain insights into your code behavior to troubleshoot performance issues. For example, you can identify bottlenecks and memory leaks for potential optimization.

See Introduction to AlwaysOn Profiling for Splunk APM.

Set up Log Observer Connect for the Splunk Platform 🔗

If your organization has an entitlement for Splunk Log Observer Connect, Splunk Observability Cloud can automatically relate logs to infrastructure and trace data.

See Set up Log Observer Connect for Splunk Enterprise or Set up Log Observer Connect for Splunk Cloud Platform.

Review the default dashboards and detectors 🔗

Splunk Observability Cloud automatically adds built-in-dashboards for each integration you use after it ingests 50,000 data points. Review these built-in dashboards when they are available. See Dashboards in Splunk Observability Cloud.

Splunk Observability Cloud also automatically adds the AutoDetect detectors that correspond to the integrations you are using. You can copy the AutoDetect detectors and customize them. See Use and customize AutoDetect alerts and detectors.

Next step 🔗

Next, prepare for a pilot rollout of Splunk Infrastructure Monitoring and Splunk Application Performance Monitoring. See Admin onboarding guide phase 2: Pilot rollout phase.

This page was last updated on May 10, 2024.

Related Topics