Splunk® Enterprise

Troubleshooting Manual

Download manual as PDF

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Generate a diag

To help diagnose a problem, Splunk Support might request a diagnostic file from you. Diag files give Support insight into how an instance is configured and how it has been operating up to the point that the diag command was issued.

About diag

The diag command collects basic information about your Splunk server, including Splunk's configuration details. It gathers information from the server such as server specs, OS version, file system, and current open connections. From the Splunk instance it collects the contents of $SPLUNK_HOME/etc such as app configurations, internal Splunk log files, and index metadata.

Diag does not collect any of your indexed data and we strongly encourage you to examine the tarball to ensure that no proprietary data is included. In some environments, custom app objects, like lookup tables, could potentially contain sensitive data. You can exclude any file or directory from the diag collection by using the --exclude flag. Read on for more details.

Note: Before you send any files or information to Splunk Support, verify that you are comfortable with sending it to us. We try to ensure that no sensitive information is included in any output from the commands below and in "Anonymize data samples to send to Support" in this manual, but we cannot guarantee compliance with your particular security policy.

Run diag with default settings

Be sure to run diag as a user with appropriate access to read Splunk files.

On *nix: $SPLUNK_HOME/bin

./splunk diag

On Windows: %SPLUNK_HOME%\bin

splunk diag

If you have difficultly running diag in your environment, you can also run the python script directly from the bin directory using cmd.

On *nix:

./splunk cmd python $SPLUNK_HOME/lib/python2.7/site-packages/splunk/clilib/info_gather.py

On Windows:

splunk cmd python %SPLUNK_HOME%\Python-2.7\Lib\site-packages\splunk\clilib\info_gather.py

For clustered environments, see the recommended steps below.

Note: The python version number may differ in future versions of Splunk Enterprise, affecting this path.

This produces diag-<server name>-<date>.tar.gz in your Splunk home directory, which you can send to Splunk Support for troubleshooting. If you're having trouble with forwarding, Support will probably need to see a diag for both your forwarder and your receiver.

Designate content for diag to include or exclude

Diag can be told to leave some files out of the diag. One way to do this is with path exclusions. At the command line you can use the switch --exclude. For example:

splunk diag --exclude "*/passwd"

This is repeatable:

splunk diag --exclude "*/passwd" --exclude "*/dispatch/*"

A more robust way to exclude content is with components. The following switches select which categories of information should be collected. The components available are: index_files, index_listing, dispatch, etc, log, pool.

  --collect=list      Declare an arbitrary set of components to gather, as a
                      comma-separated list, overriding any prior choices
  --enable=component_name
                      Add a component to the work list
  --disable=component_name
                      Remove a component from the work list

The following switches control the thoroughness with which diag gathers categories of data:

  --all-dumps=bool    get every crash .dmp file, as opposed to the default of a
                      more useful subset
  --index-files=level
                      Index data file gathering level: manifests, or full,
                      meaning manifests + metadata files) [default:
                      manifests]
  --index-listing=level
                      Index directory listing level: light (hot buckets
                      only), or full, meaning manifests + metadata files)
                      [default: light]
  --etc-filesize-limit=level
                      do not gather files in $SPLUNK_HOME/etc larger than
                      this many kilobytes, 0 disables this filter [default:
                      10000]
  --log-age=days      log age to gather: log files over this many days old
                      are not included, 0 disables this filter [default: 60]

Defaults can also be controlled in server.conf. Refer to server.conf.spec in the Admin Manual for more information.

Components

The "enable" and "disable" switches use the following components.

index_files: Files from the index that describe their contents. (Hosts|Sources|Sourcetypes.data and bucketManifests). User data is not collected. If diag collects index files on larger deployments, it might take a while to run. Read about index files in the Splexicon.

index_listing: Directory listings of the index contents are gathered, in order to see file names, directory names, sizes, timestamps, and the like. This information lands in systeminfo.txt

dispatch: The search dispatch directories. See "What Splunk Enterprise logs about itself."

etc: The entire contents of the $SPLUNK_HOME/etc directory, which contains configuration information, including .conf files.

log: The contents of $SPLUNK_HOME/var/log/... See "What Splunk Enterprise logs about itself."

pool: If search head pooling is enabled, the contents of the pool dir.

searchpeers: Directory listing of the "searchpeers" location, actually the data provided by search*heads* on indexers/search nodes.

consensus: Search Head Clustering -- Copies of the consensus protocol files used for search head cluster member coordination from var/run/splunk/_raft

conf_replication_summary: Search Head Clustering -- A directory listing of replication summaries produced by search head clustering

rest: splunkd httpd REST endpoint gathering. Collects output of various splunkd urls into xml files to capture system state. (Off by default due to fragility concerns for initial 6.2 shipment.)

kvstore: Directory listing of the Splunk key value store files.

Run diag on a remote instance

If you are not able to SSH into every machine in your deployment, you can still gather diags from full Splunk platform installations, but not from universal forwarders.

First, make sure you have the get_diag capability. The admin role has this capability by default. You also need login credentials for the remote server.

The syntax is:

splunk diag -uri https://<host>:<mgmtPort>

The options recognized for remote diag collection from the command line are --basename, --all-dumps, and exclude.

Examples

Exclude a lookup table

These two examples exclude content on the file level. A lookup table can be one of several formats, like .csv, .dat, or text.

Exclude all .csv files, or all .dat files, in $SPLUNK_HOME:

splunk diag --exclude "*.csv" or

splunk diag --exclude "*.dat"

Note: These examples will exclude all files of that type, not only lookup tables. If you have .csv or .dat files that will be helpful for Support in troubleshooting your issue, exclude only your lookup tables. That is, write out the files instead of using an asterisk.

Note: Filenames excluded by the --exclude feature will be listed in the excluded_filelist.txt in the diag to ensure Splunk Support can understand the diag.

Exclude the dispatch directory

This example excludes content on the component level. Exclude the dispatch directory to avoid gathering search artifacts (which can be very costly on a pooled search head):

$SPLUNK_HOME/bin/splunk diag --disable=dispatch

Exclude multiple directories

To exclude multiple components, use the --disable flag once for each component.

Exclude the dispatch directory and all files in the shared search head pool:

$SPLUNK_HOME/bin/splunk diag --disable=dispatch --disable=pool

Note: This does not gather a full set of the configuration files in use by that instance. Such a diag is useful only for the logs gathered from $SPLUNK_HOME/var/log/splunk. See "What Splunk Enterprise logs about itself" in this manual.

Gather only logs

To whitelist only the Splunk Enterprise internal log files:

$SPLUNK_HOME/bin/splunk diag --collect=log

Clustering diag steps

Our recommended steps for the moment for generating a diag on a Splunk data cluster are:

$SPLUNK_HOME/bin/splunk login
...enter username and password here...
$SPLUNK_HOME/bin/splunk diag --collect all

Save the settings for diag in server.conf

You can update the default settings for diag in the [diag] stanza of server.conf.

[diag]

EXCLUDE-<class> = <glob expression>

   * Specifies a glob / shell pattern to be excluded from diags generated on this instance. 
   * Example: */etc/secret_app/local/*.conf

Flags that you append to splunk diag override server.conf settings.

Diag contents

Primarily, a diag contains server logs, from $SPLUNK_HOME/var/log/splunk, and the configuration files, from $SPLUNK_HOME/etc.

Specifically, by pathname, there is:

_raft/...
Files containing the state of the consensus protocol produced by search head clustering from var/run/splunk/_raft
composite.xml
The generated file that splunkd uses at runtime to control its component system (pipelines & processors), from var/run/splunk/composite.xml
diag.log
A copy of all the messages diag produces to the screen when running, including progress indicators, timing, messages about files excluded by heuristic rules (eg if size heuristic, the setting and the size of the file), errors, exceptions, etc.
dispatch/...
A copy of some of the data from the search dispatch directory. Results files (the output of searches) are not included, nor other similar files (events/*)
etc/...
A copy of the contents of the configuration files. All files and directories under $SPLUNK_HOME/etc/auth are excluded by default.
excluded_filelist.txt
A list of files which diag would have included, but did not because of some restriction (exclude rule, size restriction). This is primarily to confirm the behavior of exclusion rules for customers, and to enable Splunk technical support to understand why they can't see data they are looking for.
introspection/...
The log files from $SPLUNK_HOME/var/log/introspection
log/...
The log files from $SPLUNK_HOME/var/log/splunk
rest-collection/...
Output of several splunkd http endpoints that contain information not available in logs. File input/monitor/tailing status information, server-level admin banners, clustering status info if on a cluster.
scripts/...
A single utility script may exist here for support reasons. It is identical for every diag.
systeminfo.txt
Generated output of various system commands to determine things like available memory, open splunk sockets, size of disk/filesystems, operating system version, ulimits.
Also contained in systeminfo.txt are listings of filenames/sizes etc from a few locations.
  • Some of the splunk index directories (or all of the index directories, if full listing is requested.)
  • The searchpeers directory (replicated files from search heads)
  • Search Head Clustering -- The summary files used in synchronization from var/run/splunk/snasphot
Typically var/...
The paths to the indexes are a little 'clever', attempting to resemble the paths actually in use (For example, on windows if an index is in e:\someother\largedrive, that index's files will be in e/someother/largdrive inside the diag). By default only the .bucketManifest for each index is collected.

Behavior on failure

If for some reason diag should fail, it will:

  1. Clean up temporary files it created while running
  2. leave a copy of the output in a temporary filename it references.

Here's a typical example:

jrodman@mcp:~$ splunk/bin/splunk diag
[... lots of normal output...]
Selected diag name of: diag-mcp-2014-09-24
Starting splunk diag...
[etc .... etc]
Getting index listings...
Copying Splunk configuration files...
Exception occurred while generating diag, we are deeply sorry.
Traceback (most recent call last):
  File "/opt/splunk/lib/python2.7/site-packages/splunk/clilib/info_gather.py", line 1959, in main
    create_diag(options, log_buffer)
  File "/opt/splunk/lib/python2.7/site-packages/splunk/clilib/info_gather.py", line 1862, in create_diag
    copy_etc(options)
  File "/opt/splunk/lib/python2.7/site-packages/splunk/clilib/info_gather.py", line 1626, in copy_etc
    raise Exception("OMG!")
Exception: OMG!

Diag failure, writing out logged messages to '/tmp/diag-fail-F2B94h.txt', please send output + this file to either an existing or new case ; http://www.splunk.com/support
We will now try to clean out the temp directory...

For most real errors, diag tries to guess at the original problem, but it also writes out a file for use in bugfixing diag. Please do send it along, and at least a workaround can often be provided quickly.

Additional resources

Watch a video on making a diag and using the anonymize command by a Splunk Support engineer.

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has about diags.

PREVIOUS
How to file a great Support case
  NEXT
Anonymize data samples to send to Support

This documentation applies to the following versions of Splunk® Enterprise: 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.2.14, 6.2.15, 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.3.14


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters