Create and manage database inputs
A database input object lets you fetch and index data from a database. Database inputs are what enable Splunk Enterprise to query your database, identify the data to consume, and then tag and index the data. Once you have set up a database input, you can use that in the same way that you use other data input you have defined in Splunk Enterprise.
Create a database input
Create a database input in DB Connect
- Click the Datalab > Inputs tab.
- Click New Input.
- Note: If you have not yet created an identity or a connection, you are prompted to do so before you can create a database input. Create a new identity and create a new connection, and then you can create a new database input.
On the new DB Input page, complete the following steps to create an input. and then click Save. At any point, you can go back to a previous step by clicking on it.
- Name input: Name and describe your new input.
- Choose and preview table: Specify a query to run to fetch data from your database.
- Set parameters: Specify how to run the query and assign rising and timestamp columns.
- Metadata: Specify metadata to assign to queried data when it is indexed.
Configure the following fields, and then click Continue:
- Name: The output name cannot contain space or special characters.
- App: The name of the Splunk Enterprise app where this input object will be saved. By default, the pop-up menu is set to Splunk DB Connect. This menu enables other apps to use DB Connect inputs, outputs, and lookups within their own context.
- Connection: Choose the connection you want to use with this input. DB Connect will try to validate the connection, and will display an error message if it is not able to do so. You cannot continue the new input setup process unless a valid connection is specified in the Connection pop-up menu.
Choose and preview table
In this step, you specify an input type, a query to run to fetch data from your database, and, optionally, a rising column checkpoint value.
Next to Input Type, choose one of the following:
A batch input invokes the same database query each time the input is run and returns all results. It does not keep track of whether rows have been added since the last time the input was run. Batch input mode is ideal for unchanging historical data that will be indexed into Splunk once. It can also be useful for re-indexing data that updates through time, such as personnel records or state tables, though this has license impact; lookups may be a better choice.
To create a batch input:
- Select Batch Input from the chooser. A mode popup menu appears:
- Automatic Mode:
- Choose the catalog, schema, and table to query from the corresponding pop-up menus.
- Choose the maximum rows to query from the Max rows menu. The default number is 100.
- Editor Mode:
- Enter a SQL query into the field. You can make complex SQL statements easier to read by clicking Format SQL. This is a one-way operation.
- Automatic Mode:
- Click the green Execute button. DB Connect will query your database and display the results.
- Once you've successfully specified the query you want, click Continue, and proceed to Set parameters.
A rising column input has a column that DB Connect uses to keep track of what rows are new from one input execution to the next. When you create a rising column input type, you must specify the rising column. You can specify as rising column any column whose value increases or decreases over time, such as a timestamp or sequential ID. For example, you can use columns such as row_id, transaction_id, employee_id, customer_id, last_updated, and so on.
Note that timestamps are not ideal for rising columns, though they often are the best available choice. Using a timestamp for rising column can produce the following problem conditions:
- A high rate of event generation can cause data duplication or loss, because checkpointing in a stream of timestamp-distinguished records assumes there is never more than one row created in a given time. If you set the time to a one second level of resolution and get five records per second, you lose or duplicate four records from every run.
- Clock skew, NTP skew corrections, physical moves between timezones, and daylight savings events can cause data mis-ordering, duplication, or loss. If the skew is towards the future, then the resulting checkpoint value may temporarily or permanently stop data collection.
- Non-numeric datetime values cannot be evaluated numerically, and lexical sorting can produce unpredictable results. If time series data is ordered lexically, then the resulting checkpoint value may temporarily or permanently stop data collection.
A rising column input requires you to enter:
- Checkpoint column: The checkpoint column is the column from which DB Connect will update the checkpoint value each time the input is run.
- Checkpoint value: The checkpoint value is how DB Connect determines what rows are new from one input execution to the next. The first time the input is run, DB Connect will only select those rows that contain the value higher or lower than the value you specified in this column. Each time the input is finished running, DB Connect updates the input's checkpoint value with the value in the last row of the checkpoint column.
- A SQL query that includes a ? symbol and order by clause for the checkpoint value: Your query must include a question mark (?) as the checkpoint placeholder and an order by clause. Every time the input is run, DB Connect replaces the question mark with the latest checkpoint value .
- From DB Connect 3 and later, the rising column checkpoints of the input are stored in
splunk/var/lib/splunk/modinputs/server/splunk_app_db_connect. For each input, it will create a separate checkpoint file.
- To see what column types (varchar, number, timestamp, and so on) are supported as rising column value types in DB Connect-supported databases, see supported rising column types by database.
With a rising column mode input, you could:
- Effectively create a "descending column," or a column wherein the checkpoint decreases every time the input runs.
SELECT abc, xyz FROM table WHERE abc < ? ORDER BY abc DSC
- Customize the comparison operation to determine the checkpoint value by evaluating a SQL function result, while simultaneously avoiding using query wrapping.
SELECT CONCAT(last_name, first_name) as NAME, ACTOR_ID, FIRST_NAME, LAST_NAME FROM actor WHERE CONCAT(last_name, first_name) > ? ORDER BY CONCAT(last_name, first_name)
- Customize the WHERE clause to add additional conditions other than the comparison expression that is specified in rising column inputs.
SELECT *FROM (SELECT * FROM ACTOR WHERE ACTOR_ID > ?) ACTOR JOIN FILM_ACTOR ON actor.ACTOR_ID = FILM_ACTOR.ACTOR_ID JOIN FILM ON FILM_ACTOR.FILM_ID = FILM.FILM_ID ORDER BY ACTOR.ACTOR_ID
- Use other advanced SQL features in the WHERE clause—for example a CASE statement.
Set the parameters that this input uses when querying the database.
Set the following parameters, and then click Continue:
- Input Type: This is the input type you chose in Step 2, Choose and preview table.
- Max Rows to Retrieve: The maximum number of rows to retrieve with each query execution.
- Fetch Size: The number of rows to return at a time from the database. If you leave this field blank, the default value of 300 is used.
- Timestamp: Specify which column contains the timestamp that will be used to order this data in the Splunk index. This value will be used to populate the _time variable and is required for indexing time-series data. It is best practice to set this value to UTC epoch time in order to force a shared frame of reference. You can then use Splunk to view events relatively along a single time line and correlate those events with other observations. Splunk will convert these times to a locale-specific representation. Useful information in the database's time representation may be lost by forcing use of epoch, such as timezones. In this case, you can use SQL to provide two time columns: an epoch representation for the _time, and a user-friendly representation for searches and dashboards.
- Current Index Time: Assigns indexed data a timestamp value that is equal to index time. Specify this option if your data has no timestamp.
- Choose Column: Select the column that contains the timestamp value. If this column is in a format which cannot be parsed as a timestamp by DB Connect, you will be shown a field where a Java SimpleDateTimeFormat compatible parser can be entered so that Splunk can understand the timestamp value. For instance:
- If the data is a DATETIME column:
- 1 datetime format in the database table
- 2 datetime format as requested by your SQL statement
- 3 datetime format as stored by Splunk
- 4 datetime format as displayed by Search
- If the data is a VARCHAR column:
- 1 datetime format in the database table
- 2 datetime format as requested by your SQL statement
- 3 datetime format as parsed by DB Connect
- 4 datetime format as stored by Splunk
- 5 datetime format as displayed by Search
- Specify Timestamp Column: This option only appears if you selected Choose Column in the previous option. Click it, and the Configure Timestamp Column pane opens. Select the column that contains the timestamps you want to use, and then click Set.
Note: Each column heading display the column's data type to aid you in determining which contains a timestamp.
- Execution Frequency: The number of seconds (or a valid cron expression) in the interval between query executions. For example, entering 120 instructs DB Connect to wait two minutes between database queries.
This step is where you specify what Splunk Enterprise field values to assign to the queried data when it is indexed:
- Host: (Optional).
- Source: (Optional). Enter a source field value for Splunk Enterprise to assign to queried data as it is indexed. As you search the indexed data using Splunk Enterprise, data from this query can be identified by the source value you enter here. For example, you might want to enter the name you gave your input. Click the field and enter a value, or choose an existing value from the menu that appears.
- Sourcetype: Enter a sourcetype field value for Splunk Enterprise to assign to queried data as it is indexed. As you search the indexed data using Splunk Enterprise, you can identify the type of data that this query returns by the sourcetype value you enter here. Click the field and enter a value, or choose an existing value from the menu that appears.
- Index: Enter an index value for the index in which you want Splunk Enterprise to store indexed data. You can enter the index name or choose it from the typeahead menu.
Note: If you want to use the customized index, you have to make sure the index exists in Splunk platform. Otherwise you have to create the index in Splunk Enterprise first to prevent data loss.
Edit database inputs
To see a list of the defined database inputs, first click the Data Lab>Inputs tab. You will see a list of your database inputs.
To edit a database input, click its name. You can make changes to a database input using the following buttons on the input page:
- Enable/Disable: Disable an input by clicking Enable/Disable here.
- Edit: Edit the input by clicking the name or the Edit button.
- Clone: Creates a copy of the input. You must give the copy a new name.
- Delete: Deletes the input.
You can also edit any of the attributes of a database input listed in Create a database input, except its name. To change the name of an input, clone it, give the clone the name you want, and then delete the original input.
Supported rising column types by database
The following matrix summarizes what column types (varchar, number, timestamp, and so on) are supported as rising column value types in DB Connect-supported databases.
|Database||varchar|| int, float, real,
|AWS RDS Aurora||x||x||x||x||x|
|Microsoft SQL Server||x||x||x||x||x|
|SAP SQL Anywhere||x||x||x|
The column marked x means this column is supported.
* Information for Informix databases is not currently available.
Use database inputs
Once you've configured a database input and your data has been indexed by Splunk Enterprise, you can use that input just as you do any other data input you've defined in Splunk Enterprise. Use the Search Processing Language (SPL) to write a series of commands and arguments.
For a quick view of what an input returns to Splunk Enterprise, go to the database inputs page by clicking Datalab>Inputs in the top navigation bar, click the name of the input to view, and then click the Find Events button on the right side of the screen. The search app opens with a pre-populated search that searches on your input.
You can search for keywords and use Boolean operators such as
NOT, plus wildcard characters ("
manufacturing (widgets AND gadgets OR gewgaw*)
Fields in Splunk Enterprise correspond to columns in your database. Search for fields using the syntax:
Chain search commands and arguments together using the pipe ("
|") character. For example, the following search retrieves indexed events from the database input
db2input that contain the term "manufacturing" and, for those events, reports the most common
manufacturing source="db2input" | top LOCATION
To refine your search further, show only the fields you want using the fields keyword. The following search only shows the five fields listed after fields, plus a timestamp, for each event retrieved:
source="db2input" | fields ADMRDEPT DEPTNAME DEPTNO LOCATION MGRNO
Of course, you can do much more with search in Splunk Enterprise. To learn more:
- Read the Search Tutorial. Since you've already gotten your data into Splunk Enterprise, start with Part 4, Using Splunk Search.
- See Searching and Reporting for a guide to documentation that is based on your level of familiarity with Splunk Enterprise and on what you want to do with your data.
- For information specific to SQL users, see SQL tips and tricks in this manual, and Splunk SPL for SQL users in the Splunk Enterprise Search Reference manual.
Create and manage database connections
Create and manage database outputs
This documentation applies to the following versions of Splunk® DB Connect: 3.0.0, 3.0.1, 3.0.2, 3.0.3