Data Fabric Search (DFS) is the new search platform that leverages the distributed processing power of external compute engines (Apache Spark Core) to broaden the scope and capability of Splunk Enterprise. Use the Splunk DFS Manager to set up your Spark cluster (recommended) for DFS or install your own Spark cluster manually.
Splunk provides support only for a Spark cluster that is deployed using the Splunk DFS Manager app. If you install your Spark cluster manually, Splunk isn't responsible for support or maintenance of the compute cluster.
Traditionally, the Splunk platform runs searches from a single search head. The multiple reporting processors running on the search head in a single machine often create a performance bottleneck on the search head, which impacts the responsiveness of reporting, monitoring, and alert operations. A DFS job enhances search performance by distributing the search processing load to the compute cluster, so that processing and memory requirements do not cause a bottleneck at the search head. This data processing between the indexers, compute nodes, and search head optimizes the search process.
DFS has the following features:
- Big Data Analysis: Analyzes and explores large amounts of data to search over a billion events within a single Splunk deployment with significant performance improvements. Additionally, DFS also allows you to perform high-cardinality searches where events have very uncommon or unique values. In fact, the higher the data cardinality, the higher the performance.
- Federated Search: Conducts searches and joins across multiple indexes on disparate Splunk deployments as seamlessly as if it was a single deployment.
Federated searches use an authorization model that enables the administrator to create service accounts for role-based user authentication across multiple Splunk deployments. A federated search head is a Splunk instance that handles search management functions, which directs search requests to federated providers that are remote Splunk Enterprise deployments. Federated searches provide the ability to correlate across a wider data fabric of multiple and disparate Splunk Enterprise deployments to access relevant datasets. The compute cluster applies the search pipeline to the results in a distributed manner. A remote search head is the Splunk Enterprise instance that resides on the remote Splunk deployment and conducts federated searches.
The following diagram illustrates the differences between a distributed search and a distributed search with a DFS compute cluster:
TLS is not enabled by default for data transport within a DFS deployment. For more information on securing your DFS deployment, see Secure a DFS deployment.
Benefits of DFS
DFS offers the following benefits:
- DFS dynamically scales the number of search requests based on the physical capacity of the Spark cluster.
- You can extend DFS to access datasets from multiple sources within a single search using
fromcommand expansions. Then, use the
unioncommand to connect and correlate the datasets. You can also use DFS to perform aggregate statistics in high-cardinality datasets with a magnitude of billions of data points using the
- Role-based data isolation
- By using defined role capabilities, DFS can help you to ensure that data is not compromised across multiple deployments through restricted access to datasets and data sources.
- DFS can improve your ability to conduct high-cardinality concurrent searches on large volumes of data without compromising performance. The higher the cardinality of the data, the higher the performance of a DFS search.
Big data analysis