Getting Microsoft Azure data into the Splunk platform
Introduction
Splunk offers many ways of getting Microsoft Azure resource data into Splunk Cloud. Essentially the trade-offs vary by ingestion type and path by ways of scaling, support, security, performance, management and cost.
When considering the best ingest option for the resource in your organization it is best to consider the trade-offs as mentioned above, as in some cases there is more than one ingestion possibility for a Azure resource type.
As a general rule, Data Manager is the recommended method of data ingestion for Splunk Cloud customers for supported data sources where available. Data Manager greatly reduces the time to configure cloud data sources from hours to minutes, while providing a centralized data ingestion management, monitoring and troubleshooting experience.
Throughout this document we will discuss the different architectures to help you choose the best solution for your use case.
Architecture diagram
The following diagram represents the different data ingestion paths for Microsoft Azure.
This diagram shows the Azure resource data sources, and their relative ingestion methods.
Note: M365 (Microsoft Office Suite) is not part of the Microsoft Azure Cloud Platform, however it is a Microsoft Cloud Service, so it is also represented here.
This document covers the Splunk Add-ons for Azure and Event Streaming via the HTTP Event Collector (HEC). For additional information on the HTTP Event Collector and Splunk Add-ons for Azure, please refer to the following documentation:
- HTTP Event Collector
- Splunk Add-on for Microsoft Cloud Services
- Splunk Add-on for Microsoft Azure
- Splunk Add-on for Microsoft Office 365
Push vs Pull
Data Availability
Microsoft has three main ways of making Microsoft Azure data available when it comes to getting data into Splunk.
Storage Accounts: A typical Storage Account is a special container with important properties for storage services and is what holds the storage services such as Blobs, Files, Queues and Tables.
Event Hubs: An Event Hub is a Microsoft fully managed, real-time data ingestion and streaming service that is extremely scalable. Event hubs are an ingestion service which deals with millions of events per second.
Rest APIs: A typical REST API is what a web service uses over HTTP and provides the interface for users to interact with the service. This service would essentially provide create, retrieve or update.
Push vs Pull
There are two main ways of ingesting Microsoft Azure data, Push and Pull.
Push
- Benefits
- Increased scalability
- Near real-time compared to pull
- Utilizes Azure services, requiring less management and maintenance overhead.
- Limitations
- Customization requires developer knowledge in one of the supported programming languages (C#, Java, JavaScript, Powershell or Python).
- Cannot be used for all sourcetypes
Pull
- Benefits
- Lower configuration overhead through Splunk TA (Technical add-ons)
- Limitations
- Runs on a schedule which can introduce delays
- At higher volumes, scaling will require additional hosts and can become more complicated.
- Cannot be used for all sourcetypes
- Use of an inputs data manager or stand alone ad-hoc search head creates a single point of failure.
Pull method: TA (Technical Add-on) : Using the Splunk Add-on for Microsoft Cloud Services and Splunk Add-on for Microsoft Azure to pull data from the different Azure data sources from a variety of Microsoft cloud services using Event Hubs, Azure Service Management APIs and Azure Storage APIs.
Push method: Using HEC (HTTP Event Collector) as the entry point to push data from the Azure data sources to Splunk (Cloud or Enterprise) over HTTP/S, which can be incorporated using Azure functions meaning use an Azure function to push to splunk via HEC. (This method will get the closest to real-time).
More information about source types using HEC can be found in HTTP Event Collector Source Types Azure.
This repository contains available Azure Functions to integrate Microsoft data with Splunk. Azure Functions can be triggered by certain events like an event arriving on an Event Hub, a blob written to a storage account, a Microsoft Teams call concluding, etc. The functions in this repository respond to these events and route data to Splunk accordingly.
Three Azure Functions
- Microsoft Teams
- Azure Event Hubs
- Azure Storage
Find below a high-level architecture diagram for both methods.
Additionally – Events arriving on an Azure Event Hub are able to trigger serverless Azure Functions. Azure Functions can then further process the raw events in near real-time.
Hybrid push and pull method
The other option to ingest Microsoft Azure data is by leveraging Data Manager for Splunk Cloud. Data Manager provides a simple, modern and automated experience of getting data in for Splunk Cloud administrators, and reduces the time it takes to configure data collection (from hours/days to minutes). Data Manager automates the initial data pipeline setup and configuration. It also allows Splunk admins to manage the pipeline health from an intuitive UI.
The diagram below highlights the Azure data sources, but Data Manager supports all the three main CSPs: Azure, AWS and GCP and is a great way to centralize data onboarding and troubleshooting for cloud data sources from a single pane of glass.
Data Manager is currently available only for Splunk Cloud Platform environments running on AWS. For Microsoft Azure Data Manager integration, there is an auto-generated ARM template that is used for the Azure configuration.
Data Manager source types
For more information on how to configure Data Manager for Microsoft Azure, please follow the guidance on the link below.
Data Manager for Azure Data-Onboarding
Supported data sources
Add-on | Input/action | Documentation |
---|---|---|
Splunk Add-on for Microsoft Cloud Services |
|
Splunkbase |
Microsoft Add-on for Microsoft Azure |
|
Splunkbase |
Splunk Add-on for Microsoft Office 365 | Management Activity
Service Health & Communications
Mailbox
Microsoft 365
One Drive
SharePoint
Teams
Yammer
Audit Logs
Cloud Application Security
Message Trace |
Splunkbase |
HTTP Event Collector Source Types
Below shows the source types that can be sent to Splunk cloud by HTTP Event Collector.
Input/action | Sources | Documentation |
---|---|---|
Active Directory | Sending AD Logs | |
Diagnostic Logs | Sending Diagnostic Logs | |
Azure Monitor Metrics | Sending Azure Metrics | |
Activity Logs |
|
Sending Azure Activity Logs |
NSG Flow Log Data | Azure functions for sending Azure Storage data to a Splunk HTTP Event Collector (preferred method)
Splunking Azure: NSG Flow Logs (Option 2) Sending NSG Flow Logs | |
Application Insights | ||
Network Watcher |
Helpful Links
- Azure/O365 Splunk Add-on
- Getting Cloud data into Splunk
- Splunk Lantern - Getting started with Microsoft Azure Event Hub data
Note: Microsoft Azure uses diagnostics settings to define data export and destination rules. Each resource to be monitored must have a diagnostic setting. Diagnostic settings can be defined using the Azure portal, PowerShell, Azure CLI, diagnostics settings Resource Manager templates, REST API, or an Azure Policy.
Refer to the detailed diagram below for source type data flow.
Event Hubs
Microsoft Event Hubs are used with various methods of data ingestion and data streaming platform. When using Event Hubs it is recommended to use partitions to spread the peak load of the large volumes of events across the partitions.
Security
When it comes to security of getting data into Splunk Cloud, there are two entities to think about.
- Security of Microsoft Azure
- Security of Splunk Cloud
Both Microsoft Azure and Splunk Cloud have SSO SAML, LDAP and RBAC of which Zero Trust is recommended. By thinking every authorization behind a firewall as trusted would be wrong, instead only give each user access to what he/she would need to perform their job with least privilege access.
Security starts with Zero Trust and policy of which can be implemented.
Within Splunk Cloud's Role Based Access Controls (RBAC) are six (6) different roles in the table below which helps define user capabilities in a secure way.
Role | Capabilities |
---|---|
Apps | Manage apps, also has some admin capabilities |
Power | Edit all shared objects and alerts, tag events |
User | Create, edit, and run searches. Can also edit its own preferences |
Can_delete | Can delete by keyboard |
Sc_admin (Cloud-specific) | Create users and roles |
Tokens_auth | Configure token-based authorization |
Splunk cloud offers two types of encryption:
- Encryption in transit uses industry standard SSL/TLS 1.2+ encryption. This is used by forwarders and user sessions.
- Encryption at rest uses AES 256-bit advanced encryption. This service is available as a premium service enhancement.
Splunk Cloud Platform uses AWS KMS (Key Management Service), a fully managed service which helps to create and manage encryption. Essentially, Splunk is responsible for the overall management of all the keys, including their creation, rotation, and revocation. Splunk Cloud Platform also offers Enterprise Managed Encryption Keys (EMEK) as an option for encryption at rest. This gives you the option to bring your own primary encryption key.
Splunk Cloud Platform supports compliance with the following compliance regulations:
- SOC type 2
- ISO 27001
- PCI
- HIPAA
- FEDRAMP
You can read more about how Splunk products support compliance regulations here.
Getting Google Cloud Platform data into the Splunk platform | Getting AWS data into the Splunk platform |
This documentation applies to the following versions of Splunk® Validated Architectures: current
Feedback submitted, thanks!