SmartStore, in common with other features of Splunk Enterprise, provides a number of tools that you can use to troubleshoot your deployment:
- Monitoring console
- Log files
- CLI commands
- REST endpoints
This topic discuss each of these tools in the SmartStore troubleshooting context. In addition, it covers some common SmartStore issues and their possible causes.
Troubleshoot with the monitoring console
You can use the monitoring console to monitor most aspects of your deployment. This section discusses the console dashboards that provide insight into SmartStore activity and performance.
The primary documentation for the monitoring console is located in Monitoring Splunk Enterprise.
Several dashboards monitor SmartStore status. The dashboards are scoped either to a single instance or to the entire deployment. Find the dashboards under the Indexing menu and the SmartStore submenu:
- SmartStore Activity: Instance
- SmartStore Activity: Deployment
- SmartStore Cache Performance: Instance
- SmartStore Cache Performance: Deployment
SmartStore Activity dashboards
The SmartStore Activity dashboards provide information on activity related to the remote storage, such as:
- Remote storage connectivity
- Bucket upload/download activity
- Bucket upload/download failure count
The SmartStore Activity dashboards also include check boxes that you can select to show progress if you are currently performing data migration or bootstrapping.
SmartStore Cache Performance dashboards
The SmartStore Cache Performance dashboards provide information on the local caches, such as:
- The values for the
server.confsettings that affect cache eviction
- The bucket eviction rate
- Portion of search time spent downloading buckets from remote storage
- Cache hits and misses
- Repeat bucket downloads
View the dashboards themselves for more information. In addition, see Indexing:Indexes and volumes in Monitoring Splunk Enterprise.
Troubleshoot with log files
Several log files can provide insight into the state of SmartStore operations.
splunkd.log. Examine these log channels:
S3Client. Communication with S3.
GCSClient. Communication with GCS.
StorageInterface. External storage activity (at a higher level than
CacheManager. Activity of the cache manager component.
CacheManagerHandler. Cache manager REST endpoint activity (both server and client side).
KeyProviderManager. Errors related to key provider setup and configuration. The key provider is used when the system has encrypted data on the remote store.
search.log . Examine these log channels:
CacheManagerHandler. Bucket operations with cache manager REST endpoint activity.
S2BucketCache. Search-time bucket management (open, close, and so on).
BatchSearch, CursoredSearch, IndexScopedSearch, ISearchOperator. Search activity related to buckets.
- Contains information on bucket operations, such as upload, download, evict, and so on.
- Contains metrics concerning operations on external storage.
- Contains a trail of the search process activity against the cache manager REST endpoint.
Use dbinspect to obtain information about SmartStore buckets
You can search with the
dbinspect command to obtain information about SmartStore buckets. It is important to understand the effect of the
cached argument on SmartStore bucket searches.
cached argument is set to "t",
dbinspect gets its statistics from the bucket's manifest. If set to "f",
dbinspect examines the bucket itself.
The default for
cached is "t" for SmartStore indexes and "f" for non-SmartStore indexes. Do not change the default.
cached argument bears no relationship to the term "cache" used with SmartStore, the effect of the
cached argument matters significantly for SmartStore buckets, because
cached=f examines an indexer's local copy of the bucket while
cached=t examines instead the bucket's manifest, which contains information about the canonical version of the bucket that resides in the remote store.
To understand the importance of this distinction, consider
sizeOnDiskMB, which is one of the fields that
dbinspect returns. If a copy of a SmartStore bucket is no longer in the indexer's local cache, the bucket directory is empty and thus has a size of 0. Therefore, if
dbinspect inspects the local bucket to determine the value of
sizeOnDiskMB. Since the local version of the bucket consists only of the empty directory,
dbinspect returns a size of 0 for the bucket, even though the full bucket exists in remote storage. Similarly, a cached copy of a bucket might contain only a subset of the files in the full bucket, and
cached=f will return only the size of that subset, rather than the full size of the bucket.
However, the true size of the bucket (that is, the size of the canonical copy of the bucket in the remote store) is available through the bucket's manifest. So, if
cached=t, the indexer pulls the bucket's manifest from the copy in the remote store and
dbinspect then uses the statistics in that manifest, thus returning the true size of the bucket.
For more information on
dbinspect, see dbinspect in the Search Reference.
Test connectivity with remote storage
One common problem is connectivity with the remote storage. Connectivity problems can result from network or permissions issues. Use the
splunkd cmd rfs command to test connectivity with remote storage. This section demonstrates some uses for the command.
List the contents of the "foobar" index on remote storage:
splunk cmd splunkd rfs ls index:foobar
List the contents of a given bucket on remote storage:
splunk cmd splunkd rfs ls bucket:foo~737~B1CE2AB0-CE4A-4697-83F2-1C5DBFB6485A
Test getting a file from remote storage:
splunk cmd splunkd rfs getF bucket:foo~737~B1CE2AB0-CE4A-4697-83F2-1C5DBFB6485A/guidSplunk-B1CE2AB0-CE4A-4697-83F2-1C5DBFB6485A/Hosts.data /tmp/foo/
Test getting an unencrypted file from remote storage that uses SSE-C:
splunk cmd splunkd rfs getR bucket:foo~737~B1CE2AB0-CE4A-4697-83F2-1C5DBFB6485A/receipt.json /tmp/foo/
Troubleshoot with REST searches
Use the following search to get a list of buckets that are actively being searched:
| rest /services/admin/cacheman search=cm:bucket.ref_count>0
ref_count value increments by 1 when the search opens the bucket and decrements by 1 when the search closes the bucket.
Use the following search to get a list of buckets that have not been uploaded to the remote store (that is, are not "stable" on the remote store):
| rest /services/admin/cacheman search=cm:bucket.stable=0
Searches are running slowly or appear stuck
Slow or stuck searches are often due to these issues:
- Performance issues with remote storage.
- The cache manager is evicting buckets too aggressively.
- Cold cache issues. A cold cache occurs when an indexer cluster peer node participating in a search does not have a local copy of some needed buckets and therefore must download the buckets from remote storage. A cold cache can result from the manager reassigning primary bucket copies to different peers.
Searches erroring out
The search-related error message "Failed to localize fileSet='....' for bid='...'. Results will be incomplete." indicates an error condition while downloading the specified bucket,
For more details, examine
splunkd.log on the indexer issuing the error.
Disk full issues
A disk full related message indicates that the cache manager is unable to evict sufficient buckets. These are some possible causes:
- Search load overwhelming local storage. For example, the entire cache might be consumed by buckets opened by at least one search process. When the search ends, this problem should go away.
- Cache manager issues. If the problem persists beyond a search, the cause could be related to the cache manager. Examine
splunkd.logon the indexer issuing the error.
Add a SmartStore index
The SmartStore cache manager
This documentation applies to the following versions of Splunk® Enterprise: 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8