Assign task priorities
This is an advanced administration task.
As a splunk administrator monitoring a large environment, you can prioritize and deprioritize the execution of different tasks by the Distributed Collection Scheduler during data collection.
There are times when the collection of certain data types is of the highest priority and you want to ensure that you can collect the complete data set. You also want to collect the data with minimal impact on the resources in your environment.
You can control which data collection nodes receive resource intensive tasks so that you can adjust the resources of just those machines. For example, if you have a memory intensive task, then you want to direct that task to run on data collection nodes that you have provisioned with more memory than other data collection nodes in your environment.
You can change how data is collected by manually editing the configuration files. The configuration files used to monitor task execution are hydra_node.conf
and ta_vmware_collection.conf
on the scheduler node.
Set task priorities
The file ta_vmware_collection.conf
contains the field, <task>_priority=value
. The value that you assign to this field is used to determine the priority number for jobs of the given task. The tasks that have a priority assigned to them now have a relative priority assigned that places them at a higher or lower level over other tasks in the job queue. You can assign 0, a negative number, or a positive number as a values to this field.
- 0 (the default value) implies that there is no change in default data collection priorities for tasks.
- A negative value increases the job priority. A negative value lowers the priority number but this increases the actual relative priority of a given task. Jobs are worked on in ascending order of priority number, that is “1” is higher priority than “5”.
- A positive value decreases the job priority. A positive value increases the priority number but this lowers the actual relative priority of a given task. Note that a positive priority adjustment almost always results in job expirations except in environments that are underloaded.
In ta_vmware_collection.conf
all of the tasks are listed.
#These are all the tasks that should run everywhere task = hostvmperf, otherperf, hierarchyinv, hostinv, vminv, clusterinv, datastoreinv, rpinv, task, event
You can assign a priority to each of the tasks using the custom field <task>_priority
.
# The number to add to the priority number for jobs of a given task, negative number makes higher priority task_priority = -60 event_priority = -60 hierarchyinv_priority = -120
To assign a priority to a task:
- Edit
$SPLUNK_HOME/etc/apps/local/ta_vmware_collection.conf
on the scheduler node (typically on the search head). - Add the
<task>_priority
field(s) for the specific task(s) and assign a value to the field. The value can be 0 (no change), a negative number (increase job priority), or a positive number (decrease job priority). - Restart Splunk.
- Restart the Distributed Collection Scheduler.
Example
Assign the following values to task, event, and hierarchyinv in ta_vmware_collection.conf
task_priority = -60 event_priority = -60 hierarchyinv_priority = -120
Unix epoch time is used to determine the priority number for tasks. For example, if epoch is currently 188, using the values assigned to <task>_priority
above for task_priority and
hierarchyinv_priority, hierarchyinv events have a priority of 68 (Unix epoch time less the value assigned to task_priority) and task events have a priority of 128. hierarchyinv is always collected before task events are collected.
How jobs are assigned
The Distributed Collection Scheduler sets the capabilities of the workers on the respective data collection nodes. Jobs are only assigned to nodes that have the capability of running those jobs, as defined in hydra_node.conf
(on the scheduling node). Tasks that have a task (priority) weight of 0 (can not be executed by a data collection node) are flagged as error and all jobs relating to that task are ignored. Ready jobs are sorted based on their task weighting. Jobs are load balanced and the capabilities of the assigned jobs are checked to ensure that jobs are distributed to optimize resources and to avoid overloading any one data collect node. If the environment is unbalanced, you get a warning indicating that the capabilities have caused an uneven load across the data collection nodes.
How to limit data collection | Set capabilities |
This documentation applies to the following versions of Splunk® App for VMware (Legacy): 3.1
Feedback submitted, thanks!