Manage data collection

 As a Splunk administrator monitoring a large environment, prioritize tasks using the Distributed Collection Scheduler during data collection.

Assign task priorities

Assign task priorities when you have a complex environment with multiple data collection nodes.

Task priorities let you control the resource distribution for data collection nodes. For example, if you have a memory-intensive task, run that task on data collection nodes that you provisioned with more memory than other data collection nodes in your environment.

You can change how data is collected by manually editing the configuration files. The configuration files hydra_node.conf and ta_vmware_collection.conf monitor task execution on the scheduler node.

Set task priorities

The <task>_priority field in ta_vmware_collection.conf determines the priority number for jobs for a task. Zero, a negative number, or a positive number are valid values for this field.

Zero is the default value for <task>_priority. A value of zero for this field indicates that there is no change in default data collection priorities for tasks.
A negative value increases the job priority. A negative value lowers the priority number but this increases the actual relative priority of a given task. The Distributed Collection Scheduler works on jobs in ascending order of priority number. That is, 1 is higher priority than 5.
A positive value decreases the job priority. A positive value increases the priority number but this lowers the actual relative priority of a given task. A positive priority number can result in job expiration if the environment is not overloaded.

ta_vmware_collection.conf lists all tasks.

These are all the tasks that should run everywhere
task = hostvmperf, otherperf, hierarchyinv, hostinv, vminv, clusterinv, datastoreinv, rpinv, task, event

The <task>_priority field determines the priority to each of the tasks.

The number to add to the priority number for jobs of a given task, negative number makes higher priority
task_priority = -60
event_priority = -60
hierarchyinv_priority = -120

Stop the Distributed Collection Scheduler.
Edit $SPLUNK_HOME/etc/apps/local/ta_vmware_collection.conf on the scheduler node (typically on the search head).
Add the <task>_priority field to a task.
Enter a value for the field.
Restart Splunk Enterprise.
Restart the Distributed Collection Scheduler.

Task priorities example

Assign the following values to task, event, and hierarchyinv in ta_vmware_collection.conf.

task_priority = -60 
event_priority = -60
 hierarchyinv_priority = -120

Unix epoch time determines the priority number for tasks. For example, if epoch is currently 188, using the values above for <task>_priority and  hierarchyinv_priority, hierarchyinv events have a priority of 68, and task events have a priority of 128. The value for hierarchyinv events equals the Unix epoch time minus the task_priority value. The Distributed Collection Scheduler always collects hierarchyinv events before task events.

 How jobs are assigned

The Distributed Collection Scheduler sets the capabilities of the workers on the data collection nodes. The Distributed Collection Scheduler assigns jobs only to nodes that have the capability of running those jobs, as defined in hydra_node.conf on the scheduling node. The Distributed Collection Scheduler sorts ready jobs based on their task weight. The Distributed Collection Scheduler load balances the jobs and checks the capabilities of the assigned jobs to make sure that jobs are distributed to optimize resources and to avoid overloading any one data collect node. A warning appears if the environment is unbalanced. A data collection node can not execute a task that has a task weight of zero. It reports an error and ignores all jobs associated with that task.

Related answers from Splunk Community

Manage data collection

Assign task priorities

Set task priorities

Task priorities example

How jobs are assigned

Comments

Manage data collection

Was this topic useful?

 How jobs are assigned