Splunk Cloud Platform

Splunk Cloud Platform Admin Manual

Acrobat logo Download manual as PDF


This documentation does not apply to the most recent version of SplunkCloud. Click here for the latest version.
Acrobat logo Download topic as PDF

Use the Indexing dashboards

Data that you send to Splunk Cloud is stored in indexes. Managing your indexes and their data is important to ensuring the speed and quality of your search results and the accuracy of your data insights.

The dashboards accessed from the Cloud Monitoring Console > Indexing tab let you to administer the following indexing and related functionality in your deployment:

  • Thoroughly review your indexes, including their performance, current data consumption, and remaining storage capacity, and events and indexing rate of an individual index.
  • Manage the quality of your data and correct parsing errors encountered during the conversion process.
  • Monitor the progress of HTTP Event Collection tokens within your deployment, if you enabled this functionality.

You can self-manage your Splunk Cloud index settings. See The Indexes page in the Splunk Cloud User Manual.

A blue progress bar might appear above a panel, indicating that the Splunk platform is still generating data. Wait for the bar to disappear before reviewing the panel.

Do not modify any Cloud Monitoring Console (CMC) dashboard. Changing any of the search criteria, formatting, or layouts may cause inaccurate results and also override the automatic update process.

Check indexing performance

The CMC Indexing Performance dashboard provides information to Splunk Cloud administrators on incoming data consumption. Use this dashboard to analyze the throughput rate of your indexers and determine if the rate needs to be optimized.

Review the Indexing Performance dashboard

This dashboard contains four panels. The Time Range in the Historical Charts area controls the date range of the data displayed in the bottom three panels.

To investigate your panels, go to Cloud Monitoring Console > Indexing > Indexing Performance. Use the following table to understand the dashboard interface.

Panel or Filter Description
Indexing Throughput Shows the speed of the indexing rate in KB per second for all of your indexers.
Historical Data This area includes the three panels shown under this section.

Set a Time Range value to refresh the data in these panels.

Estimated Indexing Rate Provides a bar chart of the estimated indexing rate over time, based on KB ingested per second.

You can split by index, source, or source type, or view the total of all these inputs.

<variable> Queue Fill Ratio The title of this panel is dynamic and depends on the specified Aggregation value, which can be one of the following:
  • Median
  • Maximum
  • Minimum
  • 90th Percentile
  • Sampled

After you select an Aggregation value, select a Queue value to view the latency performance of each queue in the graph. Queue options are the following:

  • Splunk Tcpin Queue
  • Parsing Queue
  • Aggregation Queue
  • Typing Queue
  • Indexing Queue

Comparing the queues against one another shows you which queue has the lowest latency and is hindering indexing performance. Note that latency performance is also known as fill percentage over time.

Splunk TCP Port Closures Shows the percentage of indexers that have closed their forwarder connection port at least once in the specified time range.

A high percentage value could indicate that the ingest pipeline is overwhelmed or misconfigured, and data is not being ingested. Contact Splunk Support to resolve this issue.

Interpret indexing performance results

When interpreting your indexing performance results, note the following:

  • Regularly review your indexing performance and ensure that on average it is adequately handling the load. Though occasional spikes are normal, a consistently high load degrades performance.
  • Check for these issues:
    • An indexing rate lower than expected. An example of this is an indexing rate of 0 with a forwarder outgoing rate of 100.
    • A TCP port closure percentage value that is high. This percentage indicates an ingestion pipeline issue and indicates that data is potentially being lost.
    • Source types that are sending a larger volume than expected.

Check index and storage retention capacity

The CMC Indexes and Storage dashboard provides insights into the data use of your indexes so that you can better understand your current usage and predict future licensing needs. The dashboard shows information for both active and inactive indexes, but focuses primarily on active indexes and their events.

Splunk Cloud retains data based on index settings that enable you to specify when data is to be deleted. Data retention capacity space in your Splunk Cloud service is based on the volume of uncompressed data that you want to index on a daily basis.

Storage is based on your subscription type. You can also purchase additional data retention capacity. For more information, see the information about storage and subscription types in the Splunk Cloud Service Description. Be sure to choose the correct service description version for your Splunk Cloud deployment from the Version drop-down menu.

For more information about creating and managing Splunk Cloud indexes, see Manage Splunk Cloud Indexes in the Splunk Cloud User Manual.

Review the Indexes and Storage dashboard

This dashboard contains five panels of index and event information.

The CMC Indexes and Storage dashboard provides insights into your data retention based on the uncompressed data you have indexed.

To investigate your panels, go to Cloud Monitoring Console > Indexing > Indexes and Storage. Use the following table to understand the dashboard interface.

Panel or Filter Description
Indexes with Events Shows the number of indexes that include at least one event. These indexes may be active or inactive.
Total Configured Indexes Shows the total number of active and inactive indexes in the deployment. These indexes may or may not have events.
Total Active Index Size (24h) Shows the total maximum reported size of all active indexes over the last 24 hours.
Total Event Count Shows the total event count of all active indexes over the last 24 hours.
Active Indexes (Last 24 Hours) This table provides the following information for active indexes of the past 24 hours:
  • Index name
  • Index size (GB)
  • Total event and bucket counts
  • Minimum, maximum, and retention times

Interpret index and storage capacity results

This section describes how you can determine your retention usage, set an alert if you've exceeded your licensed usage, and investigate high-consumption indexes.

Determine retention usage and set an alert

To find your retention usage and set an alert, perform these steps:

  1. Take note of your Total Active Index Size (24h) panel. This number represents the total size of the uncompressed data in indexes that were active over the last 24 hours.
  2. Compare this value to your licensed entitlement amount to see if you need to update your license based on current usage. Go to Cloud Monitoring Console > License Usage to view the appropriate dashboard for your subscription type, either ingest-based or workload-based. If you do not know your licensed entitlement or subscription type, contact your Splunk account representative.
  3. Create a query against CMC, and configure Splunk Cloud to generate an alert if the value exceeds your licensed usage. Run the query against Last 24 hours.

The following sample query shows the alert where to replace license_gb=100 with your licensed daily data ingestion value in GB. You can add an alert on the GB utilized (ingest_gb) or the percentage of license consumed (utilization_pct).

(index=_telemetry host=*.*splunk*.* NOT host=sh*.*splunk*.* source=*license_usage_summary.log* type="RolloverSummary") 
| stats latest(b) AS b by peer, pool, _time 
| stats sum(b) AS ingest_bytes
| eval ingest_gb=round((ingest_bytes / 1024 / 1024 / 1024),3), license_gb=130000
| eval utilization_pct=(ingest_gb/license_gb)*100

For detailed instructions on creating alerts, see the Splunk Cloud Alerting Manual.

Check indexes for high consumption

Your licensed data retention capacity is based on two variables: the daily licensed ingestion rate (for example, 1 TB per day) and the amount of time Splunk Cloud is licensed to retain your data (or example, 30 days). To understand how your data retention compares to your licensed retention, it's a good idea to view details about your index storage.

When you configure data retention for an index, you also configure two variables: the size of the index, and the number of days to retain the data. For example, you set data retention for 10 TB or 90 days, whichever comes first. If your data is retained for less time than you configured, it's likely that your ingestion rate is higher than expected. For example, if you configured your index to store data for 90 days or 10 TB, and you see that the data is being retained for 10 days, it's likely that you have hit the 10 TB threshold much sooner than expected, indicating a high ingestion rate. Conversely, a longer retention than expected could indicate a misconfiguration of your index settings, showing that you configured data retention for a time period that exceeds your licensed retention.

To investigate high-consumption indexes, perform these steps:

  1. Check to see which active indexes are larger than others. You want to find which active index is consuming the most storage and why. To do this, check the index size, which shows the uncompressed data retained by the index.
  2. Look at the Active Indexes (Last 24 Hours) panel, which displays the rows of the Index Size (GB), Total Event Count, and Total Bucket Count columns in descending order. Click a heading to sort the column's order, such as clicking the Index Size (GB) heading to sort the indexes by size.
  3. Click the name of a larger index to open the Index Detail dashboard.
  4. On the Index Detail dashboard, you can see if there's a spike or a higher trend line for an index.
    The graphic shows the Index Details for an index to illustrate a spike in activity.

    Both of these data points are clues that will tell you that you may need to adjust index settings or investigate further to determine what's causing the spike.
    • If you see a spike or rise in data, sort by source type or host to understand if there is a specific cause for the increase. You may then need to investigate your host or source to determine if there is an issue.
    • If you don't see spikes or a higher trend line, you do not have an issue with ingestion.

A good method to determine if your data usage is running higher than expected is to check the dates of the earliest and latest events and compare this time period to the retention setting for the individual index. For example, if the earliest event is 2020/01/25, the latest event is 2020/01/31, and the retention setting for the index is 90 days, then the data ingestion for the index was met long before the time retention setting was met. So, the data ingestion was greater than anticipated.

Check index detail

The CMC Index Detail dashboard provides Splunk Cloud administrators with a more granular view about the events in and performance of a specific index. Use this dashboard to more thoroughly investigate individual indexes.

Review the Index Detail dashboard

This dashboard shows six panels of information for a specified index.

To investigate your panels, go to Cloud Monitoring Console > Indexing > Index Detail. Use the following table to understand the dashboard interface.

Panel or Filter Description
Index The selected index value affects all panels in this dashboard.

The indexes available to you are based on user access levels.

Overview Shows the uncompressed raw data size and total bucket count of the specified index.
Events Shows the total number of events in the specified index, and the timestamps of the earliest and latest events.
Throughput: Last 24 hours (GB) Shows the speed of the indexing rate in KB per second for the specified index over the last 24 hours.

You can split by host, source, or source type. This value is the y-axis in the graph.

If the Undefined Host value appears, see the Interpret index detail results section.

Interpret index detail results

Use the Index Detail dashboard to monitor the flow of data into the system by index. If there is an issue that affects one or more indexes, analyzing the metadata for each affected index can help you diagnose the underlying issue.

Use this dashboard along with the Index and Storage dashboard to check for indexes with high consumption rates. For more information, see Check index and storage retention capacity in this topic. Because consumption spikes may adversely affect your license limits for daily ingestion and data retention, be sure to investigate any unusual or abnormal spikes.

The value Undefined Host appears in the Throughput: Last 24 Hours (GB) chart when the CMC app encounters an index configuration issue and can't correctly parse the data. This issue generally indicates that the index host name is either not configured or incorrectly configured for a forwarder. For information about configuring the host for a forwarder, see the entry for hostname or host in Forward data from files and directories to Splunk Cloud.

Check the status of HTTP event collection

The CMC HTTP Event Collector dashboard provides the status of your Splunk HTTP Event Collection (HEC) functionality to Splunk Cloud administrators, if you use HEC tokens to securely transmit event and application data. Use this dashboard to view summarized and detailed information about your HEC token usage and performance.

See also Set up and use HTTP Event Collector in the Splunk Cloud Getting Data In manual.

Review the HTTP Event Collector dashboard

This dashboard contains a number of panels about your HEC token data.

Panels are grouped into one of three views, with a fourth view that combines the other three views so you can see all the data concurrently. You can also opt to see all your HEC token data in the results, or specify a particular token for analysis.

The Historical Data view contains two graphs with a variable in the panel title that you set with a filter option: <variable> Count and Data <variable>.

For a HEC token to display in this dashboard, it must meet either of the following conditions:

  • Be enabled and have received data within the last 7 days.
  • Be recently disabled but have received messages within the last 7 days, prior to being disabled.

To investigate your views, go to Cloud Monitoring Console > Indexing > HTTP Event Collector. Use the following table to understand the dashboard interface.

View or Filter Description
HEC Token Specify an option to see data for all HEC tokens or one specific token.

See the information in the previous section as to valid tokens that display in this dashboard.

Select View Select Usage, Current Throughput, or Historical Data to see a specific view of the data, or select All to see a combined view.
Usage The HTTP Event Token Usage (Last 7 Days) panel shows a table that lists the token name, trend line, and count.
Current Throughput The Current Throughput panel shows information on the throughput of your requests and data, per second.

The Activity (Last 30 Minutes) graph shows the count of requests and data received (MB) over time.

Historical Data Set the time range for the historical data display.

The Request Overview panel shows the total, valid, and invalid request counts. This panel is associated with the <variable> Count graph. The title variable depends on the selected Activity Type option.

The Split by Token checkbox displays only for Events and Valid Requests options.

The Data Overview panel shows the total MB received and indexed. This panel is associated with the Data <variable> graph. The title variable depends on the selected Data Type option. The Split by Token checkbox displays only for the Indexed and Valid Received options.

The Errors graph shows the count of all or only specific token errors over time. Select an error type from the Reason filter.

The Split by Token checkbox displays when you select one of the following error type options:

  • Authentication Errors
  • Requests to Disable Token
  • Requests to Incorrect URL
  • Parser Errors

Interpret HTTP event collection results

When interpreting your HTTP event collection results, note the following:

  • Use the Errors panel in the Historical Data view to identify HEC token processing issues that you must resolve, such as authentication failures, parser errors, and invalid requests.
  • A Data Received value that is greater than the Data Indexed value indicates that Splunk couldn't process the received messages. This generally occurs because of parsing issues, such as missing timestamps. You can check these values in the Current Throughput and Historical Data views.

See also Detecting scaling problems in the Splunk Cloud Getting Data In manual.

Verify data quality

The CMC Data Quality dashboard provides information to Splunk Cloud administrators on issues that prevented the Splunk platform from correctly parsing your incoming data. Use this dashboard to analyze and resolve common issues that happen during the ingestion process.

Your data quality can have a great impact on both your system performance and your ability to achieve accurate results from your queries. If your data quality is degraded enough, it can slow down search performance and cause inaccurate search results. Be sure to regularly check and repair any data quality issues before they become a problem.

Generally, data quality issues fall under three main categories:

  • Line breaks: When there are problems with line breaks, the ability to parse your data into the correct separate events that it uses for searching is affected.
  • Timestamp parsing: When there are timestamp parsing issues, the ability to determine the correct time stamp to use for the event is affected.
  • Aggregation: When there are problems with aggregation, the ability to break out fields correctly is affected.

Review the Data Quality dashboard

The tables in this dashboard list the issues Splunk Cloud encountered when processing your events at both the source type and source levels. To help you better identify which of your data sources have quality issues, you can opt to exclude Splunk source types in the results.

This dashboard contains one panel with a variable in the title: Issues by source type <variable> by source.

To investigate your panels, go to Cloud Monitoring Console > Indexing > Data Quality. Use the following table to understand the dashboard interface.

Panel or Filter Description
Time Range Set the time range for the data display.
Include Splunk Source Types Specify whether to include or exclude Splunk source types from the results. Choose No to exclude Splunk source types and filter the results to only your source types.
Event Processing Issues by Source Types The results table lists the following information:
  • Sourcetype: Click to open the Issues by source type <variable> by source panel.
  • Total issues
  • Source count: Total number of individual sources contained in the source type.
  • Line breaking, timestamp parsing, and aggregation issues

When any cell shows a number greater than 0, click the cell to view the underlying search and related information. This data will help you resolve the issue.

Issues by source type <variable> by source The <variable> value depends on the selected sourcetype. The results table lists the following information:
  • Source: Click any source to open its related Event Line Count, Event Size, and Event Time Disparity panels.
  • Total issues
  • Line breaking, timestamp parsing, and aggregation issues

Interpret data quality results

This section discusses how to check the quality of your data and how to repair issues you may encounter. However, the concept of data quality depends on what factors you use to judge quality. For the purposes of this section, data quality means that the data is correctly parsed.

Guidelines

Finding and repairing data quality issues is unique to each environment. However, using the following guidelines can help you address your data quality:

  • It's a good idea to check your most important data sources first. Often, you can have the most impact by making a few changes to a critical data source.
  • Data quality issues may generate hundreds or thousands of errors due to one root cause. Sort by volume and work on repairing the source that generates the largest volume of errors first.
  • Repairing data quality issues is an iterative process. Repair your most critical data sources first, and then run queries against the source again to see what problems remain.
  • For your most critical source, resolve all data quality issues. This helps to ensure that your searches are effective and your performance is optimal.
  • Run these checks on a regular cadence to keep your system healthy.

For more information, see Resolve data quality issues in the Splunk Cloud Getting Data In manual.

Example

The following example shows the process of resolving a common data quality issue using information from the CMC Data Quality dashboard, specifically, resolving timestamp parsing issues in a source. The steps to resolve your particular data quality issues may differ, but you can use this example as a general template for resolving data quality issues.

  1. In the Data Quality dashboard, view the Event Processing Issues by Source Type panel. For this example, you are most concerned with timestamp errors in the syslog source, so you need to drill down into that source.

  2. The graphic shows the Cloud Monitoring Console > Indexing > Data Quality page detail. It is intended to orient the user.

  3. Drilling down, you can see that the majority of issues are with the following source: /var/log/suricata/stats.log.

  4. The graphic shows the Cloud Monitoring Console > Indexing > Data Quality page with a detailed view of syslog. This is a troubleshooting step for repairing timestamp issues.

  5. Click the source to drill down further and see the searches against this source.

  6. The graphic shows a Cloud Monitoring Console > Indexing > Data Quality detail. You can see the detailed search query and details about the search. It is a troubleshooting step in repairing timestamp issues.

  7. From here, you can look at a specific event. You can see that the issue is that the Splunk platform was unable to parse the timestamp in the MAX_TIMESTAMP_LOOKAHEAD field.

  8. The graphic shows a Cloud Monitoring Console > Indexing > Data Quality detail. From this detail, you can see that the timestamp in the MAX_TIMESTAMP_LOOKAHEAD field needs to be repaired.

  9. To fix this, go to Settings in the search bar and select Source types in the DATA section.
  10. In the filter, enter syslog for the source type.
  11. Select Actions > Edit. The Edit Source Type page opens.
  12. Click Timestamp > Advanced… to open the Timestamp page for editing. Ensure you are satisfied with the timestamp format and the Lookahead settings. In this case, you need to edit the Lookahead settings so that the Splunk platform can parse the timestamp correctly.

  13. The graphic shows the process of editing a timestamp in the Settings > Sourcetype screen in Splunk Cloud. It is intended to illustrate editing the timestamp.

  14. Return to the main Edit Source Type page and go to the Advanced menu. From here you can make other changes if needed.

  15. The graphic shows the Edit Source > Advanced screen. It is intended to orient the user.
Last modified on 02 September, 2021
PREVIOUS
Use the Alerts panel
  NEXT
Use the Search dashboards

This documentation applies to the following versions of Splunk Cloud Platform: 8.0.2006, 8.0.2007, 8.1.2009, 8.1.2011, 8.1.2012


Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters