Manage data collection
As a Splunk administrator monitoring a large environment, prioritize tasks using the Distributed Collection Scheduler during data collection.
Assign task priorities
Assign task priorities when you have a complex environment with multiple data collection nodes.
Task priorities let you control the resource distribution for data collection nodes. For example, if you have a memory-intensive task, run that task on data collection nodes that you provisioned with more memory than other data collection nodes in your environment.
You can change how data is collected by manually editing the configuration files. The configuration files hydra_node.conf
and ta_vmware_collection.conf
monitor task execution on the scheduler node.
Set task priorities
The <task>_priority
field in ta_vmware_collection.conf
determines the priority number for jobs for a task. Zero, a negative number, or a positive number are valid values for this field.
- Zero is the default value for
<task>_priority
. A value of zero for this field indicates that there is no change in default data collection priorities for tasks. - A negative value increases the job priority. A negative value lowers the priority number but this increases the actual relative priority of a given task. The Distributed Collection Scheduler works on jobs in ascending order of priority number. That is, 1 is higher priority than 5.
- A positive value decreases the job priority. A positive value increases the priority number but this lowers the actual relative priority of a given task. A positive priority number can result in job expiration if the environment is not overloaded.
ta_vmware_collection.conf
lists all tasks.
These are all the tasks that should run everywhere task = hostvmperf, otherperf, hierarchyinv, hostinv, vminv, clusterinv, datastoreinv, rpinv, task, event
The <task>_priority
field determines the priority to each of the tasks.
The number to add to the priority number for jobs of a given task, negative number makes higher priority task_priority = -60 event_priority = -60 hierarchyinv_priority = -120
- Stop the Distributed Collection Scheduler.
- Edit
$SPLUNK_HOME/etc/apps/local/ta_vmware_collection.conf
on the scheduler node (typically on the search head). - Add the
<task>_priority
field to a task. - Enter a value for the field.
- Restart Splunk Enterprise.
- Restart the Distributed Collection Scheduler.
Task priorities example
Assign the following values to task, event, and hierarchyinv in ta_vmware_collection.conf
.
task_priority = -60 event_priority = -60 hierarchyinv_priority = -120
Unix epoch time determines the priority number for tasks. For example, if epoch is currently 188, using the values above for <task>_priority
and
hierarchyinv_priority
, hierarchyinv events have a priority of 68, and task events have a priority of 128. The value for hierarchyinv events equals the Unix epoch time minus the task_priority value. The Distributed Collection Scheduler always collects hierarchyinv events before task events.
How jobs are assigned
The Distributed Collection Scheduler sets the capabilities of the workers on the data collection nodes. The Distributed Collection Scheduler assigns jobs only to nodes that have the capability of running those jobs, as defined in hydra_node.conf
on the scheduling node. The Distributed Collection Scheduler sorts ready jobs based on their task weight. The Distributed Collection Scheduler load balances the jobs and checks the capabilities of the assigned jobs to make sure that jobs are distributed to optimize resources and to avoid overloading any one data collect node. A warning appears if the environment is unbalanced. A data collection node can not execute a task that has a task weight of zero. It reports an error and ignores all jobs associated with that task.
Deploy Splunk App for VMware in an indexer cluster deployment | Filter log data collection |
This documentation applies to the following versions of Splunk® App for VMware (Legacy): 3.3.0
Feedback submitted, thanks!