Upgrade the Splunk Data Stream Processor to 1.3.1

This topic describes how to upgrade the Splunk Data Stream Processor (DSP) to 1.3.1.

To upgrade to DSP 1.3.1 successfully, you must complete several prerequisite tasks before starting the upgrade. Make sure to read through all of the Before you upgrade sections in this topic and complete the relevant prerequisite tasks before upgrading DSP.

DSP does not provide a means of downgrading to previous versions. If you need to revert to an older DSP release, uninstall the upgraded version and reinstall the version you want.

Before you upgrade

Complete the following tasks before upgrading DSP. If you don't complete these tasks, you might encounter issues such as pipeline failures.

Make sure you are using DSP 1.2.4 or 1.3.0

You can upgrade to DSP 1.3.1 from DSP 1.2.4 or DSP 1.3.0. If you are using a patch version of DSP, such as DSP 1.2.1-patch02 or DSP 1.2.2-patch02, then you must upgrade to DSP 1.2.4 first. See Upgrade the Splunk Data Stream Processor to 1.2.4.

Review known issues

Review the known issues related to the upgrade process. Depending on what functions you have in your pipelines, you might need to complete some additional steps to restore those pipelines after the upgrade is complete.

Review the features planned for deprecation or removal

Review the Features planned for deprecation or removal to see what features are scheduled for future deprecation.

Identify connections that need to be replaced

If you are upgrading to DSP 1.3.1 from DSP 1.2.4, identify any pipelines that connect to the following data sources or destinations, and track how these connections are being used:

Amazon Kinesis Data Streams
Apache Kafka
Apache Pulsar
Microsoft Azure Event Hubs

Specifically, keep track of the names of all the pipelines where these connections are being used and whether each connection points to a data source or a data destination. After upgrading to DSP 1.3.1, you need to replace these connections with ones that use the new source-specific and sink-specific connectors. See the Update pipelines to use new DSP 1.3.0 connections section on this page for more information.

If you upgrade to DSP 1.3.1 from DSP 1.2.4, pipelines with the aforementioned connections will continue to run successfully. However, these connections won't appear on the Connections page, and you won't be able to modify them. You can only delete them.

Disable scheduled jobs

If you're running any scheduled data collection jobs using the following source connectors, you must disable those jobs before upgrading DSP:

Amazon CloudWatch Metrics
Amazon S3
AWS Metadata
Google Cloud Monitoring
Microsoft 365
Microsoft Azure Monitor

If you don't disable all the scheduled jobs in these connectors before upgrading your DSP deployment, the Kubernetes container image name used by these connectors is not updated. See the ImagePullBackoff status shown in Kubernetes after upgrading DSP troubleshooting topic for more information.

In DSP, select the Connections page.
For each connection that uses one of the source connectors listed earlier, do the following:
1. Select the connection to open it for editing.
2. Toggle the Scheduled parameter off.
3. Save your changes.

Remove all machine learning functions

If you are upgrading to DSP 1.3.1 from DSP 1.3.0 or are not using the Streaming ML plugin, skip this step. The Streaming ML Plugin beta feature and all of the machine learning functions included in the plugin have been removed in DSP 1.3.0. See Feature deprecation and removal notices.

Before upgrading DSP, you must remove the following machine learning functions from all active pipelines:

Adaptive Thresholding
Apply ML Model
Datagen
Drift Detection
Pairwise Categorial Outlier Detection
Sentiment Analysis
Sequential Outlier Detection
Time Series Decomposition (STL)
estdc
perc

If you don't remove these functions, the pipelines containing them will fail and all other pipelines will also need to be restarted after upgrading.

In DSP, select the Pipelines page.
For each pipeline that uses a machine learning function, do the following:
1. Open the pipeline for editing. If the pipeline is active, click Deactivate.
2. Delete the machine learning function from the pipeline.
3. Click Save, and reactivate the pipeline if it was active before. When you reactivate a pipeline, you must select where you want to resume data ingestion. See Using activation checkpoints to activate your pipeline in the Use the Data Stream Processor manual for more information.

Upgrade the Splunk Data Stream Processor

Once you've prepared your DSP environment for upgrade by completing the tasks described in the Before you upgrade section, follow these steps to upgrade DSP.

Download the new DSP tarball on one of the master nodes of your cluster.
Extract the tarball.
```
tar xf <dsp-version>.tar
```
Navigate to the extracted file.
```
cd <dsp-version>
```
(Optional) If your environment has a small root volume (6GB or less of free space) in /tmp, your upgrade may fail when you run out of space. Choose a different directory to write temporary files to during the upgrade process.
```
export TMPDIR=/<directory-on-larger-volume>
```
From the extracted file directory, run the upgrade script.
```
sudo ./upgrade
```

Upgrading can take a while, depending on the number of nodes you have in your cluster. Once the upgrade is done, the message Upgrade completed successfully is shown, followed by some garbage collection logs. Once you see those logs, you can start using the latest version of DSP. Any pipelines that were active before the upgrade are reactivated.

Validate the upgrade

Log in to DSP to confirm that your upgrade was successful.

In the browser you use to access the DSP UI, clear the browser cache.
Navigate to the DSP UI.
```
https://<DSP_HOST>:30000/
```

On the login page, enter the following:

User: dsp-admin
Password: <the dsp-admin password>

After upgrading

After successfully upgrading DSP, complete the following tasks:

If you are upgrading to DSP 1.3.1 from DSP 1.3.0, you can disregard Update pipelines to use new DSP 1.3.0 connections and Enable automatic updates for lookups.

After upgrading to the latest version of DSP, any command-line operations must be performed in the new upgraded directory on the master node.

Delete old DSP directories

On each node, delete the directories containing the old version of DSP. This is an optional clean-up step.

To delete old DSP directories, run the following command on each node:

rm -r <dsp-version-upgraded-from>

Re-enable scheduled jobs

Re-enable the scheduled jobs that were disabled during Disable scheduled jobs.

In DSP, select the Connections page.
For each scheduled data collection job that you need to re-enable, do the following:
1. Select the connection where the scheduled job is defined.
2. Toggle the Scheduled parameter on.
3. Save your changes.

Update pipelines to use new DSP 1.3.0 connections

Starting in DSP 1.3.0, connectors that supported both source and sink functions have been replaced by connectors that specifically support source functions only or sink functions only. If you have any pipelines that connect to Amazon Kinesis Data Streams, Apache Kafka, Apache Pulsar, or Microsoft Azure Event Hubs, you need to recreate the connections using the new source-specific and sink-specific connectors and then update your pipelines to use these new connections.

Pipelines with the aforementioned connections will continue to run successfully. However, these connections won't appear on the Connections page, and you won't be able to modify them. You can only delete them.

Recreate your connections using the new connectors as needed. For example, if your pipeline connects to Amazon Kinesis Data Streams as a data source, then recreate that Kinesis connection using the Connector for Amazon Kinesis Data Streams Source. For detailed instructions on creating connections, see the Connect to Data Sources and Destinations with DSP manual.
Select the Pipelines page.
For each pipeline that needs to be updated to use a source-specific or sink-specific connection, do the following:
1. Open the pipeline for editing. If the pipeline is active, click Deactivate.
2. Select the source or sink function for which you need to update the connection.
3. On the View Configurations tab, click the Delete icon () next to the Connection id field to delete the connection that's being replaced.
4. Set Connection id to the appropriate source-specific or sink-specific connection that you created during step 1.
5. Click Save, and reactivate the pipeline if it was active before. When you reactivate a pipeline, you must select where you want to resume data ingestion. See Using activation checkpoints to activate your pipeline in the Use the Data Stream Processor manual for more information.

Enable automatic updates for lookups

Starting in version 1.3.0, DSP automatically checks for updates to CSV lookup files. However, any active pipelines that are using CSV lookups from a previous version of DSP are not automatically migrated to this new behavior. Do the following steps to enable automatic updates for lookups.

As an alternative option, you can also enable automatic updates by uploading a new version of a previous lookup file and restarting all pipelines using the lookup file.

Log in to the Splunk Cloud Services CLI. Copy and save the bearer token returned to a preferred location.
```
./scloud login --verbose
```

Get a list of all of the CSV lookups being used.

curl -X GET -k "https://<DSP_HOST>:31000/default/streams/v3beta1/connections?connectorId=b5dfcb94-142e-470f-9045-ad0b83603bdb" \
        -H "Authorization: Bearer <my-bearer-token>" \
        -H "Content-Type: application/json"

Copy and save the id for each CSV lookup that you want to enable automatic updates for in a preferred location. Any CSV lookup that does not have check_for_new_connection_secs and trim_edge_whitespace configurations do not have automatic updates enabled.

curl -X PATCH -k "https://<DSP_HOST>:31000/default/streams/v3beta1/connections/<csv-lookup-id>" \
-H "Authorization: Bearer <my-bearer-token>" \
-H "Content-Type: application/json" \
-d '{ "data": {"check_for_new_connection_secs":60, "trim_edge_whitespace":null}}'

(Optional) Verify your changes.

curl -X GET -k "https://<DSP_HOST>:31000/default/streams/v3beta1/connections"\
-H "Authorization: Bearer <my-bearer-token>" \
-H "Content-Type: application/json"

Open the DSP UI and restart all pipelines using this CSV lookup. To restart a pipeline, deactivate it and reactivate it again.
You only need to restart the affected pipelines once. Afterwards, DSP will automatically detect when you upload a new version of a CSV lookup file and active pipelines will automatically switch to using the latest version of the CSV file.

Upgrade the Splunk App for DSP

If you have the Splunk App for DSP installed on your Splunk DSP cluster, you must upgrade it to the latest version. See Install the Splunk App for DSP for more information.

Review known issues and apply workarounds

There are some known issues that can occur when upgrading. Review the Known issues for DSP topic, and follow any workarounds that apply to you.

Related answers from Splunk Community

Upgrade the Splunk Data Stream Processor to 1.3.1

Before you upgrade

Make sure you are using DSP 1.2.4 or 1.3.0

Review known issues

Review the features planned for deprecation or removal

Identify connections that need to be replaced

Disable scheduled jobs

Remove all machine learning functions

Upgrade the Splunk Data Stream Processor

Validate the upgrade

After upgrading

Delete old DSP directories

Re-enable scheduled jobs

Update pipelines to use new DSP 1.3.0 connections

Enable automatic updates for lookups

Upgrade the Splunk App for DSP

Review known issues and apply workarounds

Comments

Upgrade the Splunk Data Stream Processor to 1.3.1

Was this topic useful?