Application Management

 


Index data

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

Index data

This topic makes the assumption that you are using syslog-ng to write all your log files to a directory structure on your Splunk machine. This is not the "best" deployment (which is to use Splunk forwarding agents on each logging machine) or the quickest and dirtiest (which is to use plain syslog and have Splunk grab the data directly from the UDP port). It's good for this example because it gets the data with a reasonable amount of structure and reliability without throwing the whole forwarder structure at you when you may be just beginning.

If you use syslog-ng over a TCP port, you can configure syslog-ng to write to a directory structure on the Splunk machine. It's easiest for Splunk if you split this out by source type and host -- for example, /var/log/j2eelog/j2eeserver1/, /var/log/j2eelog/j2eeserver2/, /var/log/weblog/weblogserver1/, /var/log/weblog/weblogserver2/, etc.

Note: In 4.0.x, you may have some problems trying to input files from /var/log due to contention with the *Nix app. The *Nix app is pretty cool -- it lets you see all kinds of operations data for your Unix box in Splunk -- but it generates a lot of data and uses /var/log for itself. You can get rid of it by going to $SPLUNK_HOME/etc/apps and deleting the unix directory or moving it out of the apps directory completely and backing it up somewhere else.

If you want to know a little more about the considerations around these different ways of inputting data, read the next section. If you just want to see how to input data, skip ahead.

Collecting data from remote machines

To get data into Splunk, you must add each data source as an input. The recommended method for a production environment is to install a Splunk forwarder on each machine where you want to collect data. However, if you are new to Splunk, or if you are running a test deployment and do not want to set up Splunk forwarders, you can configure syslog or syslog-ng to forward input data to a central Splunk index.

If you use syslog, you send the all logs from your servers over UDP to port 514 and set Splunk up to grab that stream directly. There are some problems with this.

Grab the web log data

If you are writing your logs to a central location, the easiest way to index the data is to monitor the log directory. When you select a file or directory to monitor, Splunk uploads and indexes the specified data, then continues to index new data in the file or directory as it comes in. You can specify a mounted or shared directory as long as the Splunk server has permission to read from the directory. Splunk detects log file rotation and does not process renamed files it has already indexed. See Monitor files and directories for more information.

This section steps through how to monitor the files in the example setup. Here's a couple of sample entries from the web logs:

2010-03-17 10:13:46,918 [WEB] INFO messageType = POST, messageStatus = INIT, accountNumber = COT4808718813, host = 10.52.60.56, messageDetails = Begin posting message to content store
2010-03-17 10:13:46,954 [WEB] INFO messageType = POST, messageStatus = TASK, accountNumber = COT4808718813, host = 10.52.60.56, messageDetails = Opening connection to host: [ www.contentstore.com:80 ]

Splunk loves these files. It eats them like jam (or chocolate). Each log record is on its own line and because of the nice way the key/value pairs are set up with the = sign, they are easy to deal with even after indexing. There isn't a lot of fuss or complication here.

To monitor the web log files:

1. Click Add Data in the Launcher.

AppManageLaunchAddData.png

OR

2. Go to the Search app, navigate to Manager (link at top right of the screen) and select Data inputs.

AppManageMgrDataInput.png

The Data input window is displayed.

AppManageDataInput.png

3. Click Add New at the right of the Files & Directories row.

4. Select Monitor a file or directory.

AppManageInputSource.png

5. For Full path on the server, enter /var/log/weblog/.... This syntax uses the ellipsis (...) wildcard to represent an arbitrary directory.

6. Select segment in path from the Set host menu and enter 4 for the segment #. This tells Splunk to use the name of the fourth directory (e.g., weblogserver1) in the path as the host name.

AppManageInputHost.png

7. Select Manual from the Set sourcetype menu and enter weblog.

Note: Splunk recognizes a number of standard log formats. To assign a one of Splunk's pretrained source types to a log, select from From list and choose the correct source type, for example, log4j or weblogic_stdout. If you have logs in multiple formats in the directory you are monitoring and they are all pretrained source types, you can use Automatic to make your life easier.

AppManageInputSourcetype.png

8. Select test from the Index menu.

AppManageInputIndex.png

9. Make sure the Advanced options are blank. You can use these options if you are monitoring a directory with lots of subdirectories -- for example all of /var/log -- and you want to be selective about which subdirectories you bring into Splunk. You can also use these to tell Splunk to ignore existing data in the directory and only eat new data.

10. Click Save.

You need to restart Splunk for it to actually start loading data. But let's add the other logs first.

Get the J2EE logs

Now get the J2EE logs in the same way. Here are some sample entries from those logs:

<TRANSACTION date="2010-03-17 10:13:49,756" activityCode="1060" subscriberID="103298280" accountNumber="COT2167944500" callerID="MAR10159LA" transactionStatus="COMPLETE" result="SUCCESS" host="10.34.51.89" comment="Invocation of Content API for sequenceNumber 103298280 Successful" />
<TRANSACTION date="2010-03-17 10:13:52,008" activityCode="1010" subscriberID="109000446" accountNumber="COT9138634144" callerID="MAR10249LA" transactionStatus="COMPLETE" result="SUCCESS" host="10.25.50.49" comment="Invocation of Content API for sequenceNumber 109000446 Successful" />

To monitor these logs:

1. In the Monitor a file or directory window, locate your web log data input and click Clone.

AppManageCloneInput.png

2. Change Full path on the server, enter /var/log/j2eelog/....

AppManageJ2eeInput1.png

3. Change source type to j2eelog.

AppManageJ2eeInput2.png

4. Click Save.

Get the API logs

Now for the API logs. Here's a couple of sample entries:

#### 2010-03-17 10:13:47,543
     nameSpace:         content.static.API
     subscriberID:      107018813
     callerID:          TTCOV104435254-7305027
     driver:            content.jdbc.ContentDriver
     callerAction:      MAR10354LA
     host:              10.52.60.28
     connectionResult:  SUCCESS
     Details:           Successfully updated contentDB 
#### 2010-03-17 10:13:48,626
     nameSpace:         content.static.API
     subscriberID:      3238231843
     callerID:          TTCOV106842965-5744617
     driver:            content.jdbc.ContentDriver
     callerAction:      MAR10899LA
     host:              10.52.60.27
     connectionResult:  SUCCESS
     Details:           Successfully updated contentDB 

Note: The next topic, Configure linebreaking, describes how to ensure Splunk figures out the correct boundaries between records.

First, monitor these logs:

1. In the Monitor a file or directory window, locate your web log data input and click Clone.

AppManageCloneInput.png

2. Change Full path on the server to /var/log/apilog/....

AppManageAPIInput1.png

3. Change source type to apilog.

AppManageAPIInput2.png

4. Click Save to go back to the Data Inputs (Files) window.

Get the database error logs

1. Go back to Splunk Web. The Data Inputs (Files) window should still be available.

2. Once again, locate your web log data input and click Clone.

AppManageCloneInput.png

3. For Full path on the server, enter /var/log/mysqld/....

AppManageDBInput1.png

4. Change source type to mysqld.

AppManageDBInput2.png

5. Click Save.

Other ways to get data

You do not have to configure each input directly. In a production environment, you would probably choose to monitor all of /var/log and use configurations to set the host, source, and source type of the different files. See About default fields for more information.

This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!