Splunk® Data Fabric Search

Data Fabric Search

Download manual as PDF

Download topic as PDF

DFS overview

Data Fabric Search (DFS) is the new search platform that leverages the distributed processing power of external compute engines (Apache Spark Core) to broaden the scope and capability of the Splunk Enterprise.

Traditionally, the Splunk platform runs searches from a single search head. The multiple reporting processors running on the search head in a single machine often create a performance bottleneck on the search head, which impacts the responsiveness of reporting, monitoring, and alert operations. A DFS job enhances search performance by distributing the search processing load to the compute cluster, so that processing and memory requirements do not cause a bottleneck at the search head. This data processing between the indexers, compute nodes, and search head optimizes the search process.

DFS has the following features:

  • Big Data Analysis: Analyzes and explores large amounts of data to search over a billion events within a single Splunk deployment with significant performance improvements. Additionally, DFS also allows you to perform high-cardinality searches where events have very uncommon or unique values. In fact, the higher the data cardinality, the higher the performance.
  • Federated Search: Conducts searches and joins across multiple indexes on disparate Splunk deployments as seamlessly as if it was a single deployment.

You must install a DFS license in addition to the Enterprise license for Splunk to perform DFS searches.

Federated searches use an authorization model that enables the administrator to create service accounts for role-based user authentication across multiple Splunk deployments. A federated search head is a Splunk instance that handles search management functions, which directs search requests to federated providers that are remote Splunk Enterprise deployments. Federated searches provide the ability to correlate across a wider data fabric of multiple and disparate Splunk Enterprise deployments to access relevant datasets. The compute cluster applies the search pipeline to the results in a distributed manner. A remote search head is the Splunk Enterprise instance that resides on the remote Splunk deployment and conducts federated searches.

The following diagram illustrates the differences between a distributed search and a distributed search with a DFS compute cluster:

Distributed Search Vs Distributed Search with DFS

TLS is not enabled by default for data transport within a DFS deployment. For more information on securing your DFS deployment, see Secure a DFS deployment.

Benefits of DFS

DFS offers the following benefits:

Scalability
DFS dynamically scales the number of search requests based on the physical capacity of the Spark cluster.
Extensibility
You can extend DFS to access datasets from multiple sources within a single search using from command expansions. Then, use the join and union command to connect and correlate the datasets. You can also use DFS to perform aggregate statistics in high-cardinality datasets with a magnitude of billions of data points using the stats command.
Role-based data isolation
By using defined role capabilities, DFS can help you to ensure that data is not compromised across multiple deployments through restricted access to datasets and data sources.
Performance
DFS can improve your ability to conduct high-cardinality concurrent searches on large volumes of data without compromising performance. The higher the cardinality of the data, the higher the performance of a DFS search.

How DFS Works

A DFS search pipeline involves several stages of processing.

Stage 1: On the search head:

  1. Receive a DFS-enabled search.
  2. Define the sequence to run the various components of the search.
  3. Set up a compute environment using a DFS compute cluster to run the sequence of operations to complete the search.

Stage 2: On the indexers

  1. Process the remote portion of the search.
  2. Send the intermediate results to the DFS compute cluster.

Stage 3: On the DFS compute cluster

  1. Process the remaining steps in the run sequence of the search.
  2. Send the intermediate results back to the search head.

Stage 4: On the search head

  1. Apply relevant knowledge objects to the intermediate search results.
  2. Send the final results of the search to the user interface.

The following diagram shows the data flow for a DFS search pipeline:

DFS Search Pipeline

The following queries are examples of a DFS search:

| dfsjob [ search index=network | stats count by ip]

How federated search works

A federated search pipeline involves several stages of processing.

Stage 1: On the federated search head

  1. Specify information on the federated providers in the federated.conf located at $SPLUNK_HOME/etc/system/local.
  2. Construct the federated search using the remote search information specified in the federated.conf and the savedsearches.conf files .
  3. Set up a compute environment using a DFS compute cluster to run the sequence of operations to complete the search.

Stage 2: On the remote search head and federated search indexers

  1. Process the remote portion of the search.
  2. Send the intermediate results to the compute cluster.

Stage 3: On the DFS compute cluster

  1. Process the remaining steps in the run sequence of the search.
  2. Send the intermediate results back to the federated search head.

Stage 4: On the federated search head

  1. Apply relevant knowledge objects to the intermediate results.
  2. Send the final results of the search to the user interface.

The following diagram shows the data flow for a federated search pipeline:

Federated Search Pipeline

The following is an example of a federated SPL search:

| dfsjob | union [ |from federated:networkRemote1 | stats count by ip ] [ |from federated:networkRemote2 | stats count by ip ][search index=networkLocal | stats count by ip] | stats count

  NEXT
DFS Terminology

This documentation applies to the following versions of Splunk® Data Fabric Search: 7.3.0


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters