Splunk® DB Connect

Deploy and Use Splunk DB Connect

Download manual as PDF

This documentation does not apply to the most recent version of DBX. Click here for the latest version.
Download topic as PDF

Create and manage database inputs

A database input object lets you fetch and index data from a database. Database inputs are what enable Splunk Enterprise to query your database, identify the data to consume, and then tag and index the data. Once you have set up a database input, you can use that in the same way that you use other data input you have defined in Splunk Enterprise.

Create a database input

Create a database input in DB Connect

  1. Click the Datalab > Inputs tab.
  2. Click New Input.
Note: If you have not yet created an identity or a connection, you are prompted to do so before you can create a database input. Create a new identity and create a new connection, and then you can create a new database input.

On the new DB Input page, complete the following steps to create an input. and then click Save. At any point, you can go back to a previous step by clicking on it.

  1. Name input: Name and describe your new input.
  2. Choose and preview table: Specify a query to run to fetch data from your database.
  3. Set parameters: Specify how to run the query and assign rising and timestamp columns.
  4. Metadata: Specify metadata to assign to queried data when it is indexed.

Name input

Configure the following fields, and then click Continue:

  • Name: The output name cannot contain space or special characters.
  • Description
  • App: The name of the Splunk Enterprise app where this input object will be saved. By default, the pop-up menu is set to Splunk DB Connect. This menu enables other apps to use DB Connect inputs, outputs, and lookups within their own context.
  • Connection: Choose the connection you want to use with this input. DB Connect will try to validate the connection, and will display an error message if it is not able to do so. You cannot continue the new input setup process unless a valid connection is specified in the Connection pop-up menu.

Choose and preview table

In this step, you specify an input type, a query to run to fetch data from your database, and, optionally, a rising column checkpoint value.

Next to Input Type, choose one of the following:

Batch Input

A batch input invokes the same database query each time the input is run and returns all results. It does not keep track of whether rows have been added since the last time the input was run. Batch input mode is ideal for unchanging historical data that will be indexed into Splunk once. It can also be useful for re-indexing data that updates through time, such as personnel records or state tables, though this has license impact; lookups may be a better choice.

To create a batch input:

  1. Select Batch Input from the chooser. A mode popup menu appears:
    • Automatic Mode:
      • Choose the catalog, schema, and table to query from the corresponding pop-up menus.
      • Choose the maximum rows to query from the Max rows menu. The default number is 100.
    • Editor Mode:
      • Enter a SQL query into the field. You can make complex SQL statements easier to read by clicking Format SQL. This is a one-way operation.
  2. Click the green Execute button. DB Connect will query your database and display the results.
  3. Once you've successfully specified the query you want, click Continue, and proceed to Set parameters.

Rising Column

A rising column input has a column that DB Connect uses to keep track of what rows are new from one input execution to the next. When you create a rising column input type, you must specify the rising column. You can specify as rising column any column whose value increases or decreases over time, such as a timestamp or sequential ID. For example, you can use columns such as row_id, transaction_id, employee_id, customer_id, last_updated, and so on.

Note that timestamps are not ideal for rising columns, though they often are the best available choice. Using a timestamp for rising column can produce the following problem conditions:

  • A high rate of event generation can cause data duplication or loss, because checkpointing in a stream of timestamp-distinguished records assumes there is never more than one row created in a given time. If you set the time to a one second level of resolution and get five records per second, you lose or duplicate four records from every run.
  • Clock skew, NTP skew corrections, physical moves between timezones, and daylight savings events can cause data mis-ordering, duplication, or loss. If the skew is towards the future, then the resulting checkpoint value may temporarily or permanently stop data collection.
  • Non-numeric datetime values cannot be evaluated numerically, and lexical sorting can produce unpredictable results. If time series data is ordered lexically, then the resulting checkpoint value may temporarily or permanently stop data collection.

A rising column input requires you to enter:

  • Checkpoint column: The checkpoint column is the column from which DB Connect will update the checkpoint value each time the input is run.
  • Checkpoint value: The checkpoint value is how DB Connect determines what rows are new from one input execution to the next. The first time the input is run, DB Connect will only select those rows that contain the value higher or lower than the value you specified in this column. Each time the input is finished running, DB Connect updates the input's checkpoint value with the value in the last row of the checkpoint column.
  • A SQL query that includes a ? symbol and order by clause for the checkpoint value: Your query must include a question mark (?) as the checkpoint placeholder and an order by clause. Every time the input is run, DB Connect replaces the question mark with the latest checkpoint value .

Note:

  • From DB Connect 3 and later, the rising column checkpoints of the input are stored in splunk/var/lib/splunk/modinputs/server/splunk_app_db_connect. For each input, it will create a separate checkpoint file.
  • To see what column types (varchar, number, timestamp, and so on) are supported as rising column value types in DB Connect-supported databases, see supported rising column types by database.

With a rising column mode input, you could:

  • Effectively create a "descending column," or a column wherein the checkpoint decreases every time the input runs.
    SELECT abc, xyz  FROM table WHERE abc < ? ORDER BY abc DSC
  • Customize the comparison operation to determine the checkpoint value by evaluating a SQL function result, while simultaneously avoiding using query wrapping.
    SELECT CONCAT(last_name, first_name) as NAME, ACTOR_ID, FIRST_NAME, LAST_NAME  FROM actor WHERE CONCAT(last_name, first_name) > ? ORDER BY CONCAT(last_name, first_name)
  • Customize the WHERE clause to add additional conditions other than the comparison expression that is specified in rising column inputs.
    SELECT *FROM 
    (SELECT * FROM ACTOR WHERE ACTOR_ID > ?) ACTOR
    JOIN FILM_ACTOR
        ON actor.ACTOR_ID = FILM_ACTOR.ACTOR_ID
    JOIN FILM
        ON FILM_ACTOR.FILM_ID = FILM.FILM_ID
    ORDER BY  ACTOR.ACTOR_ID
  • Use other advanced SQL features in the WHERE clause—for example a CASE statement.

Set parameters

Set the parameters that this input uses when querying the database.

Set the following parameters, and then click Continue:

  • Input Type: This is the input type you chose in Step 2, Choose and preview table.
  • Max Rows to Retrieve: The maximum number of rows to retrieve with each query execution.
  • Fetch Size: The number of rows to return at a time from the database. If you leave this field blank, the default value of 300 is used.
  • Timestamp: Specify which column contains the timestamp that will be used to order this data in the Splunk index. This value will be used to populate the _time variable and is required for indexing time-series data. It is best practice to set this value to UTC epoch time in order to force a shared frame of reference. You can then use Splunk to view events relatively along a single time line and correlate those events with other observations. Splunk will convert these times to a locale-specific representation. Useful information in the database's time representation may be lost by forcing use of epoch, such as timezones. In this case, you can use SQL to provide two time columns: an epoch representation for the _time, and a user-friendly representation for searches and dashboards.
    • Current Index Time: Assigns indexed data a timestamp value that is equal to index time. Specify this option if your data has no timestamp.
    • Choose Column: Select the column that contains the timestamp value.
    • If this column is in a format which cannot be parsed as a timestamp by DB Connect, you will be shown a field where a Java SimpleDateTimeFormat compatible parser can be entered so that Splunk can understand the timestamp value. For instance:
      • If the data is a DATETIME column:
        1. 1 datetime format in the database table
        2. 2 datetime format as requested by your SQL statement
        3. 3 datetime format as stored by Splunk
        4. 4 datetime format as displayed by Search
      • If the data is a VARCHAR column:
        1. 1 datetime format in the database table
        2. 2 datetime format as requested by your SQL statement
        3. 3 datetime format as parsed by DB Connect
        4. 4 datetime format as stored by Splunk
        5. 5 datetime format as displayed by Search
  • Specify Timestamp Column: This option only appears if you selected Choose Column in the previous option. Click it, and the Configure Timestamp Column pane opens. Select the column that contains the timestamps you want to use, and then click Set.
    Note: Each column heading display the column's data type to aid you in determining which contains a timestamp.
  • Execution Frequency: The number of seconds (or a valid cron expression) in the interval between query executions. For example, entering 120 instructs DB Connect to wait two minutes between database queries.

Metadata

This step is where you specify what Splunk Enterprise field values to assign to the queried data when it is indexed:

  • Host: (Optional).
  • Source: (Optional). Enter a source field value for Splunk Enterprise to assign to queried data as it is indexed. As you search the indexed data using Splunk Enterprise, data from this query can be identified by the source value you enter here. For example, you might want to enter the name you gave your input. Click the field and enter a value, or choose an existing value from the menu that appears.
  • Sourcetype: Enter a sourcetype field value for Splunk Enterprise to assign to queried data as it is indexed. As you search the indexed data using Splunk Enterprise, you can identify the type of data that this query returns by the sourcetype value you enter here. Click the field and enter a value, or choose an existing value from the menu that appears.
  • Index: Enter an index value for the index in which you want Splunk Enterprise to store indexed data. You can enter the index name or choose it from the typeahead menu.
    Note: If you want to use the customized index, you have to make sure the index exists in Splunk platform. Otherwise you have to create the index in Splunk Enterprise first to prevent data loss.

Edit database inputs

To see a list of the defined database inputs, first click the Data Lab>Inputs tab. You will see a list of your database inputs.

To edit a database input, click its name. You can make changes to a database input using the following buttons on the input page:

  • Enable/Disable: Disable an input by clicking Enable/Disable here.
  • Edit: Edit the input by clicking the name or the Edit button.
  • Clone: Creates a copy of the input. You must give the copy a new name.
  • Delete: Deletes the input.

You can also edit any of the attributes of a database input listed in Create a database input, except its name. To change the name of an input, clone it, give the clone the name you want, and then delete the original input.

Supported rising column types by database

The following matrix summarizes what column types (varchar, number, timestamp, and so on) are supported as rising column value types in DB Connect-supported databases.

Database varchar int, float, real,
bigint, number
timestamp datetime date
AWS RDS Aurora x x x x x
AWS RedShift x x x
DB2/Linux x x x
Informix *
MemSQL x x x x x
Microsoft SQL Server x x x x x
MySQL x x x x x
Oracle x x x x x
PostgreSQL x x x
SAP SQL Anywhere x x x
Sybase ASE x x x
Sybase IQ x x x
Teradata x x x

The column marked x means this column is supported.
* Information for Informix databases is not currently available.


Use database inputs

Once you've configured a database input and your data has been indexed by Splunk Enterprise, you can use that input just as you do any other data input you've defined in Splunk Enterprise. Use the Search Processing Language (SPL) to write a series of commands and arguments.

For a quick view of what an input returns to Splunk Enterprise, go to the database inputs page by clicking Datalab>Inputs in the top navigation bar, click the name of the input to view, and then click the Find Events button on the right side of the screen. The search app opens with a pre-populated search that searches on your input.

You can search for keywords and use Boolean operators such as AND, OR, and NOT, plus wildcard characters ("*"):

manufacturing (widgets AND gadgets OR gewgaw*)

Fields in Splunk Enterprise correspond to columns in your database. Search for fields using the syntax: fieldname="field value"

source="db2input"

Chain search commands and arguments together using the pipe ("|") character. For example, the following search retrieves indexed events from the database input db2input that contain the term "manufacturing" and, for those events, reports the most common LOCATION values:

manufacturing source="db2input" | top LOCATION

To refine your search further, show only the fields you want using the fields keyword. The following search only shows the five fields listed after fields, plus a timestamp, for each event retrieved:

source="db2input" | fields ADMRDEPT DEPTNAME DEPTNO LOCATION MGRNO

Of course, you can do much more with search in Splunk Enterprise. To learn more:

PREVIOUS
Create and manage database connections
  NEXT
Create and manage database outputs

This documentation applies to the following versions of Splunk® DB Connect: 3.0.0, 3.0.1, 3.0.2, 3.0.3


Comments

If you're in a hybrid environment, the heavy forwarder is on-prem and indexers are in the cloud, when adding the index name, you will not see the name of the indexes. Make sure you don't end up indexing the data locally and not sending the data to the appropriate indexers.

Swong splunk, Splunker
December 11, 2017

Hello Sideview
You are correct, the value of the checkpoint column is not specified in the 'checkpoint column' field but the 'checkpoint value' field. I've revised the document to address the issue. Thanks for your comments.

Rwang splunk, Splunker
March 29, 2017

Under checkpoint value, the text here offers detail on the particular issue of the first query, when no previous checkpoint value will exist. However what it actually says makes no sense.

"The first time the input is run, DB Connect will only select those rows that contain the value higher or lower than the value you specified in the checkpoint column"

This makes no sense because the checkpoint column specified is a "column", not a value. higher or lower than the value specified in checkpoint column means "higher or lower than the string 'ID'". I believe but I'm not sure, what it does, is somehow do a limited first query to get back N rows (for some unknown N?)

Sideview
March 29, 2017

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters