How this app fits into the Splunk platform picture

The scheduler

The scheduler orchestrates API data collection. It communicates with the worker processes on data collection nodes that perform isolated data collection tasks. Implementation of the scheduler is specific to the data collection configuration requirements of your specific domain. Data comes in and is sent using the REST API as tasks to the worker processes on the data collection nodes. The data collection nodes then executes the task and forwards the data to the Splunk indexers. Consider spreading high volume API calls across multiple data collection nodes.

Add your data collection nodes to the scheduler's configuration to collect data from your storage systems. The scheduler takes the credentials for the filer and cluster assets containing the data, and the knowledge it has about what data (performance, inventory, hierarchy) to collect from the assets and sends this collection information to the data collection nodes.

The scheduler manages the distribution of these data collection jobs on an interval specified in the collection configuration file on the search head. All communication goes in one direction, from the scheduler to the data collection nodes. The scheduler load balances based on the number of worker processes on the data collection nodes. The scheduler watches the jobs queue, and is responsible for distributing credentials to the worker nodes where they are stored locally on each node (in apps.conf). You can add or remove data collection nodes to be managed by the scheduler in the Collection Configuration Dashboard of the app. Note that the scheduler does not send data to Splunk.

The resource cost of running the scheduler on the search head is directly related to the amount of network traffic as jobs are assigned. Enable remote login on Splunk forwarders. You must change the default admin password, and change the configuration allowRemoteLogin in the server.conf file on your data collection nodes to support this requirement. See the change default values and server.conf section of the Admin manual for more information.

The Data Collection Node

The data collection node is a Splunk light or heavy forwarder with job and process management built in. It has a copy of SA-Hydra installed on it. The data collection node manages all the data collection operations (worker processes, jobs, and sessions) for each of the storage entities from which they collect data. It also initiates log message handling (all logs are written to hydra_worker.log).

The worker processes are individual input processes declared in inputs.conf of the domain specific implementation of the hydra_worker modular input.

The worker process on the data collection nodes continually check the hydra_job.conf file for new jobs assigned to the data collection node by the scheduler. When a new job comes in, the job is claimed and executed, and then conforming to Splunk best practices it sends the output to stdout and Splunk forwarding handles the rest.

Related answers from Splunk Community

How this app fits into the Splunk platform picture

The scheduler

The Data Collection Node

Comments

How this app fits into the Splunk platform picture

Was this topic useful?