Splunk® Machine Learning Toolkit

User Guide

About the ai command

The ai command is a new ML-SPL search command. This command lets you connect to externally hosted large language models (LLMs) from third-party providers including OpenAI, Gemini, Bedrock, Grok, and Ollama.

When you use the ai command Splunk data is sent to your chosen LLM and the response is presented back in the Splunk search pipeline.

There are security risks when using a commercial LLM and when sending data to an external service provider. Splunk is not providing the LLM. LLMs are external to Splunk and users are responsible for meeting any network security requirements.

The ai command brings Generative AI (GenAI) solutions to Splunk platform data. Using GenAI in your search pipeline can provide additional insights on your Splunk data and reduce time and costs to access those data insights.

You can use the ai command to answer the following kinds of questions:

  • What does this notable event mean?
  • Do these user agents look normal?
  • Can you summarize these logs for me?

There are no guardrails for the response returned from the LLM.

To help you become familiar with the ai command, the MLTK Showcase offers the following 3 examples with the new ai command in action:

Showcase Description
Field Extraction Uses the ai command to generate regex for extracting fields from Splunk internal log messages.
Summarization Uses the ai command to summarize internal Splunk error messages.
Anomaly Detection Uses the ai command to detect anomalies in time series metrics. In this case, counts of log events in the internal Splunk index.

Features of the ai command

The ai command provides the following key features:

  • Apply any supported LLM model to any data, on any Splunk dashboard, or directly within any SPL search.
  • Support of LLM inference for a single field, a number of fields, raw events, multiple events, and for any custom datasets.

The ai command does not inspect the input to the LLM. Use discretion to determine if the data you send to the LLM is suitable and appropriate.

Roles and permissions

A new role and capabilities are included with the addition of the ai command to MLTK.

When you install or upgrade to MLTK version 5.6.0 or higher there is a new role of mltk admin. The mltk admin role has the following 3 capabilities:

  • apply_ai_commander_command
  • edit_ai_commander_config
  • list_ai_commander_config

Permissions for the ai command are based on the apply_ai_commander_command capability. Users must have the apply_ai_commander_command capability to execute the ai command.

Users who have both the list_ai_commander_config and edit_ai_commander_config capabilities can view and add provider details, as well as modify tokens and model information. They have full access to all editable fields under the Connection Management tab.

Users with only the list_ai_commander_config capability can view configured providers and models but cannot view API tokens.

Parameters for the ai command

See the following table for the parameters you can use with the ai command:

If you have configured a default LLM service and model then you do not need to set the provider or model parameters.

Parameter Description
prompt Defines the data that is sent to the LLM for processing.
provider Set the LLM service provider you want to use such as OpenAI or Groq.
model Set the model you want to use. For example gpt-4o-mini or llama3-8b-8192.

Connection Management page

The Connection Management page serves as the central hub for configuring and managing the integration of LLM providers within MLTK.

By using the LLM connection feature you acknowledge and agree that data from searches using the ai command is processed by an external LLM service provider.

Supported LLM providers

You can use the ai command with the following LLM providers:

  • Ollama
  • OpenAI
  • AzureOpenAI
  • Anthropic
  • Gemini
  • Bedrock
  • Groq

You can use the provider= or model= parameters in your ai command search to switch between these providers.

Configuration steps

Complete the following steps to configure the Connection Management page with the LLM service of your choosing:

  1. Open MLTK.
  2. Navigate to the Connection Management tab as shown in the following image: This image shows the Splunk Machine Learning Toolkit app. The new tab called Connection Management is highlighted and selected. From this tab users can connect to the supported LLM provider they prefer.
  3. Choose your preferred LLM provider from the Select Service drop-down menu.
  4. Input the Settings for the chosen LLM including the Access Token and Request Timeout. The settings fields will change depending on the LLM service provider selected.

    The API endpoint URL in the Endpoint field will be populated for you.


  5. Select the model you want to use and configure the selected model settings as follows:
    1. Response Variability: Controls how much difference there is between responses to identical prompts. The higher the value the more variability. Use a positive integer.
    2. Maximum Result Rows: Controls how many records can be sent to the LLM from Splunk. Each record sent from Splunk will be processed as a separate prompt. The higher this number is, the more data will be sent for inference.

      Use caution if your chosen LLM service is charged on a per prompt basis.

    3. Max Tokens: Controls how many tokens can be generated by a single Splunk search.

      Use caution if your chosen LLM service is charged on a per prompt basis.

    4. (Optional) Select the checkbox to set the chosen service as the default.
  6. When the connection has been configured, you can use the LLM with the ai command.

    The ai command does not inspect the input to the LLM. Use discretion to determine if the data you send to the LLM is suitable and appropriate.

Example datasets

The prompt pattern examples in the next section show how to leverage the ai command in different scenarios. The examples use either the mltk_ai_commander_dataset.csvdataset, or the http_error_dataset.csv dataset.

About the mltk_ai_commander_dataset.csv

The mltk_ai_commander_dataset.csv dataset ships with version 5.6.0 and higher of MLTK. This dataset contains details of PowerShell scripts.

The dataset includes the following information:

Column name Description
Script_Name Identifier of the script.
Threat_Level Severity of the threat.
Behavior_Observed Behavior detected in the script.
IOC_Found Indicators of Compromise like IPs, hashes, domains.
Script_Content Actual script . Usually PowerShell.
Classification Malicious or Benign.

About the http_error_dataset.csv

The http_error_dataset.csv dataset that does not ship with MLTK. The dataset includes simulated HTTP error events with detailed causes and recovery suggestions.

The dataset includes the following information:

Column name Description
Error_Code HTTP status code, such as 500, or 404.
Error_Message Human-readable error description.
Root_Cause Technical reason or analysis for the error.
Recovery_Steps Recommended actions for mitigation or resolution.

You can view the contents of the http_error_dataset.csv dataset as follows, or use the following link to download the file in Excel (XLS): http_error_dataset.xls

Request_ID HTTP_Code Endpoint Error_Message Root_Cause Classification
REQ-001 500 /api/user/login The server encountered an unexpected condition which prevented it from fulfilling the request. The application server failed to establish a connection with the user database due to a timeout. Critical
REQ-002 404 /api/products/123 The requested product resource could not be found on the server. The product ID provided in the request URL does not match any existing records in the product catalog. Non-Critical
REQ-003 403 /admin/dashboard Access to the requested admin dashboard is forbidden. The user tried to access an administrative panel without sufficient permissions or valid authorization token. Critical
REQ-004 408 /api/upload The server timed out waiting for the request. The client-side upload process took too long, causing the server to abandon the connection. Non-Critical
REQ-005 502 /api/payment The server received an invalid response from the upstream server while acting as a gateway. The payment service backend returned an unexpected error or failed to respond during the transaction process. Critical
REQ-006 400 /api/register The server could not understand the request due to invalid syntax. The JSON payload sent by the client contained syntax errors or missing required fields. Non-Critical
REQ-007 503 /api/report The server is currently unavailable due to overload or maintenance. A spike in traffic or scheduled maintenance rendered the reporting service temporarily unreachable. Critical
REQ-008 401 /api/user/settings Authentication is required and has failed or has not yet been provided. The user attempted to access personal settings without including a valid authentication token in the request header. Critical
REQ-009 422 /api/order The server understands the content type and syntax, but was unable to process the instructions. One or more required fields in the order request failed validation checks due to incorrect format or missing data. Non-Critical
REQ-010 504 /api/checkout The server acting as a gateway timed out while waiting for a response from an upstream server. The third-party API responsible for processing the final checkout did not respond in time, causing a timeout. Critical

Prompt patterns

You can save time and reduce effort when using the ai command with prompt patterns. See the following prompt pattern examples which use the datasets described in the previous section:


Row-by-row prompting

Executes the ai command on each record individually. This pattern prompt is suitable for micro-analysis and for providing detailed responses.

Example 1

The following is an example of a malicious script where ​​each malicious script is analyzed in isolation to understand the context for its classification:

| inputlookup mltk_ai_commander_dataset.csv 
| search Classification="Malicious"
| head 10
| ai prompt="The script with threat level {Threat_Level} exhibited behavior {Behavior_Observed} and included the IOC '{IOC_Found}'. What could be the potential reasons for classifying this script with content: {Script_Content} as malicious? Provide a concise response within 100-200 tokens by pin-pointing the problem." provider=Ollama model=llama3:latest

Example 2

The following is an example of HTTPS errors where for each HTTP error row, the ai command provides resolution advice based on the root cause:

| inputlookup http_error_dataset.csv
| head 10
| ai prompt="HTTP Error {Error_Code} occurred with message: '{Error_Message}'. Root cause: {Root_Cause}. What specific steps can we take to resolve this? Provide a precise but informative answer." provider=OpenAI model=gpt-4

Multi-chain prompting

Chains multiple prompts together. The second prompt uses the result of the first prompt, for example ai_result_1. This prompt pattern is suitable for deeper insights or a step-by-step investigation.

Example 1

The following is an example of a malicious script. First, the ai command identifies root causes, then recommends remediation or investigation strategies:

| inputlookup mltk_ai_commander_dataset.csv 
| search Threat_Level="High" AND Classification="Malicious"
| head 10
| ai prompt="The script with threat level {Threat_Level}, behavior {Behavior_Observed} with the script content: {Script_Content} was classified as malicious. Identify the potential root causes. Limit your response to 100-200 tokens." provider=Ollama model=qwen2.5:7b
| ai prompt="Based on the identified causes: {ai_result_1}, suggest steps to remediate or investigate these scripts further, keeping the response within 50-100 tokens." provider=Ollama model=llama3:latest

Example 2

The following is an example of HTTPS errors which first provides an impact analysis, followed by operational countermeasures:

| inputlookup http_error_dataset.csv 
| search Error_Code="500"
| head 5
| ai prompt="The server responded with {Error_Code} and message: '{Error_Message}'. Root cause: {Root_Cause}. What could be the broader operational impact?" provider=Anthropic model=claude-3-sonnet
| ai prompt="Given the operational impact: {ai_result_1}, suggest proactive monitoring and recovery measures." provider=Gemini model=gemini-1.5-pro

All data at once (summarization)

Aggregates the dataset and sends a collective prompt for summarization or holistic analysis.

Example 1

The following is an example of a malicious script. The ai command synthesises an overall summary to highlight patterns and justification across all malicious entries:

| inputlookup mltk_ai_commander_dataset.csv 
| search Classification="Malicious"
| stats values(*) as *
| ai prompt="For the scripts classified as malicious, with threat levels {Threat_Level}, observed behaviors {Behavior_Observed}, and IOCs {IOC_Found} respectively (separated by newline), can you provide the root causes for classifying these scripts as malicious?" provider=Ollama model=llama3:latest

Example 2

The following is an example of HTTPS errors. Summarization is suitable for understanding overall trends and frequent errors in system logs:

| inputlookup http_error_dataset.csv
| stats values(*) as *
| ai prompt="Analyze the common root causes among the HTTP errors provided. The list includes error codes {Error_Code}, messages {Error_Message}, and root causes {Root_Cause}. Provide a consolidated summary of recurring issues and mitigation strategies." provider=AzureOpenAI model=gpt-35-turbo
Last modified on 20 May, 2025
About the fit and apply commands   Search commands for machine learning safeguards

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 5.6.0


Please expect delayed responses to documentation feedback while the team migrates content to a new system. We value your input and thank you for your patience as we work to provide you with an improved content experience!

Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters