Splunk® Data Stream Processor

Getting Data In

Acrobat logo Download manual as PDF


On April 3, 2023, Splunk Data Stream Processor will reach its end of sale, and will reach its end of life on February 28, 2025. If you are an existing DSP customer, please reach out to your account team for more information.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

Get data in with the Collect service and a pull-based connector

The Collect service is a scalable data collection service that pulls large volumes of data from external data sources and ingests it into your data pipeline. The Collect service can be integrated into the Splunk Data Stream Processor (DSP) through pull-based connectors.

Connector concepts and terminology

A connector is an extension of the Collect service that can extract events or metrics from a data source. To understand how a connector works, you need to know the following terms:

  • Scheduled job: Defines when, what, and how to collect data
  • Execution: One iteration of a scheduled job
  • Worker: A software module that collects and processes data, and then sends the results to the downstream destination

Configuration parameters

The following common configuration parameters are used by all connectors:

  • name: The name of your job.
  • connectorID: The registered connector image name. The connectorID cannot be changed once it is assigned.
  • schedule: The CRON job setting for your job, given in UTC format. The configuration uses standard CRON syntax: min hour day/month month day/week.
  • scheduled: Optional. Default true. Set this parameter to false to stop the scheduled job from automatically executing on the next CRON cycle. Jobs that are currently running are not affected.
  • eventExtraFields: Optional. An array of custom name-value pairs that can be used to annotate events, allowing one DSP pipeline to process events from multiple Collect service jobs. If a field in eventExtraFields conflicts with a field in an event, the eventExtraFields field takes precedence.
  • parameters: The parameters used to configure the connector.
  • workers: The number of workers you want to use to collect data.

Limitations of the Collect service

The Collect service has the following job limitations:

  • A maximum of 20 workers per job

The Collect service has the following scheduling limitations:

  • Jobs must be scheduled to run at least once per week. If a job runs less than once per week, you might see duplicate data.
  • Jobs must be scheduled to run no more than once every 5 minutes. If a job is scheduled to run more frequently than once every 5 minutes, some scheduled jobs might be skipped. For example, if a job is scheduled to run once per minute, it runs only once in a 5 minute time period and 4 scheduled jobs are skipped.

Data ingested can be delayed

The data ingested can be delayed because of the following reasons:

  • The latency in the data provided by the data source
  • The volume of raw data ingested, for example ingesting 1 GB of data takes longer than ingesting 1 MB of data
  • The volume of data ingested from upstream sources such as from the Ingest REST API or a forwarder

Adding additional workers might improve the data ingestion rates, but external factors will still influence the speed of data ingestion.

Permissions

By default, users only have rights to view and change their own pipelines. Users can't see pipelines created by other users in their tenant or the user list for the tenant. Administrators have full rights to view all pipelines and users in each tenant.

See Manage users and admins for more information on the permissions assigned to the user and administrator roles.

Use a pull-based connector with Splunk DSP

Pull-based connectors collect events from external sources and send them into your DSP pipeline though the Collect service.

To use a pull-based connector, do the following steps:

  1. Choose the data source that you would like to create a connection to:
  2. Create your connection and use it in your data pipeline.
Last modified on 31 October, 2019
PREVIOUS
Create a connection for the DSP Kafka SSL Connector
  NEXT
Use the Amazon CloudWatch Metrics connector with Splunk DSP

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.0.0


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters