Reports
About reports
This topic describes each of the reports provided in this app. Searches saved as reports are listed on the Reports page. When you run a search you can save it as a report, an alert, a dashboard, or an event type. In each case, the format of the saved results determines where you can find the search in Splunk Web.
To get a list of the reports, click Reports in the app menu. You can use the default reports or you can modify the reports to generate specific results for your environment.
Connection problems in the last hour
Description
Use this search to get insight to the connection problems between this app and your NetApp filers. Connection issues can prevent data from coming into the app. When the Splunk App for NetApp Data ONTAP attempts to collect data from the ONTAP filer and experiences a connection problem, all of the events that result in a time out or that result in an unsuccessful login are returned.
search
index=_internal source=*hydra* OR source=*splunk_ta_ontap_api* ("*[Errno 8]*" OR "timed out" OR "Could not login")
Unhealthy cluster nodes in the last hour
Description
This search queries the ONTAP data for an event containing the string "Node is not healthy". The search returns the name of the "unhealthy" node and a timestamp for when the message was sent. Healthy nodes in a cluster can communicate with each other. When nodes are unhealthy the cluster looses the ability to successfully and reliably perform cluster operations.
Search
index=_internal (source="*hydra*" OR source="*splunk_ta_ontap_api*") "Node is not healthy" node=* | table _time,node dispatch.earliest_time = -1h
Missing filer capability collection errors in the past hour
Description
The search returns a count of the API permissions errors. It queries all events containing errors that relate to having an incorrect set of capabilities to invoke the NetApp API. "Missing filer capability" is a specific type of collection error that indicates that a permissions error prevents the collection of data from the filers.
search
index=_internal source=*hydra* "does not have capability" ERROR dispatch.earliest_time = -1h
Volume Capacity Delta Table
Description
Use this search to be proactive regarding the storage changes in your volumes. Volume events provide you with information about the status of your volumes so that you can proactively monitor for potential storage problems. This search compares the storage on volumes between two different point in time (posterior storage used and prior storage) over the last 24 hours, and shows the change in capacity on the volumes. Computing the difference shows the growth trend that has happened between the two points calculated as a percent or based on capacity of storage on the volumes. For example, if your storage capacity is growing at 6% per day, then you can estimate how long it will take before you use up all available capacity. The legend on the chart indicates the data points that we compute in GB. A percent format is also provided. Prior storage for the particular volume is the event recorded earlier in time. Posterior storage for the volume is the event recorded later in time.
search
sourcetype=ontap:volume storage_used=* | eval name=if(isnull(name),$volume-id-attributes.name$,name) | table _time,host, storage_used,storage_used_percent,name | stats first(storage_used) as posterior_storage_used last(storage_used) as prior_storage_used first(storage_used_percent) as posterior_storage_used_percent last(storage_used_percent) as prior_storage_used_percent last(_time) as prior_time first(_time) as _time by host,name | eval percent_change=posterior_storage_used_percent-prior_storage_used_percent | eval capacity_change=posterior_storage_used-prior_storage_used | convert ctime(prior_time) | table prior_time,_time,host,name,prior_storage_used,posterior_storage_used,prior_storage_used_percent,posterior_storage_used_percent,percent_change,capacity_change dispatch.earliest_time = -24h
Total events in the past hour
Description
This search provides a total count of the number of syslog or Event Management System (EMS) events processed in the last hour. You can look at system logs to proactively monitor your environment for configuration or system changes. If there is a dramatic increase in the number of syslog events coming in from a filer, it can indicate that there is a problem. As a user look here to see if there is a problem with syslog data coming in.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems"| stats count dispatch.earliest_time = -1h
Total error events in the past hour
Description
This search returns a total count of the number of syslog or Event Management System (EMS) error events processed in the last hour. You can look at system logs to proactively monitor your environment for configuration or system changes. The search queries the ONTAP syslog data for the string "error". As a user you can monitor this input for spikes in the events. A spike can identify error events or possible states of the filer that are prone to errors. You can then drill down and find out more details about the problem.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" error | stats count dispatch.earliest_time = -1h
Total events by filer in the past hour
Description
This search returns a total count of the number of events, broken down by filer, processed in the last hour. You can see if one filer in particular is causing problems. You can use this search to proactively monitor your environment for configuration or system changes. You can see trends or investigate spikes in the data coming in.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" | stats count by host dispatch.earliest_time = -1h
Total alert and critical events in the past hour
Description
The search queries the error string of the ONTAP syslog data for the strings "alert" or "critical". It reports the number of alert of critical events that occurred in the last hour for a host.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" (alert OR critical) | stats count by host dispatch.earliest_time = -1h
Count of total disk and controller events by filer in the past hour
Description
This search returns a total count of disk and controller events. Use this information to determine the health of your system. Drill down on the chart or result table to get more detailed information. Spikes in the number of controller or disk events reported can indicate problems that need to be investigated.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" (controller OR disk) | stats count by host dispatch.earliest_time = -1h
Count of disk events over time by filer
Description
Use this search to proactively monitor your environment. This search returns a count of the number of disk events on a particular filer, processed in the last hour. The search queries the syslog data for a string containing "disk". An increase in the number of events coming from a particular disk on a filer can indicate a problem. As an admin you have an established baseline for normal behavior in your environment. Compare the numbers reported against the baseline activity for the filer to identify a potential problem. Look at the chart to see spikes in events that can determine high or low disk usage for a particular filer that is outside the normal range for your environment. You can look at the trend in your data and be proactive in managing your environment. Click on the chart or click on the table to drill down to see the events.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" disk | timechart count by host dispatch.earliest_time = -1h
Count of error events over time by filer
Description
This search returns a count of the number of error events, broken down by filer, processed in the last hour. Use this search to proactively monitor your environment. The search queries the syslog data for a string containing "error". Compare the numbers reported by the search against the baseline activity for the filer. Look at the chart to see spikes in error events. Drill down on the chart or values in the table to get to individual error events. Examine the error events for severity and the impact the problem has on your system. Look at the chart to see trends in your data and be proactive in managing your environment.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" error | timechart count by host dispatch.earliest_time = -1h
Count of disk error events over time by filer
Description
Use this search to proactively monitor your environment. This search returns a count of the number of disk error events (such as disk failures, problems with disk assignment) on a particular filer, in the last hour. The search queries the syslog data for strings containing "disk" and "error". As an admin you have an established baseline for normal behavior in your environment. Compare the numbers reported against the baseline activity for the filer. Look at the chart to see spikes in error events. Drill down on the chart or values in the table to get to individual events. Examine the error events for severity and the impact the problem has on your system. Look at the chart to see trends in your data.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" error disk | timechart count by host dispatch.earliest_time = -1h
Count of alert and critical events over time by filer
Description
Use this search to proactively monitor your environment. This search returns a count of the number of ONTAP syslog events that are of an alert or critical status, that happened in the last hour, on a particular filer. Drill down on the chart or values in the table to examine individual events. Looking at the chart you can see trends in your data and be proactive in managing your environment.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" (alert OR critical) | timechart count by host dispatch.earliest_time = -1h
Count of read error events on disks by filer
Description
Use this search to proactively monitor your environment. This search returns a count of the number of disk read error events on a particular filer, in the last hour. The search queries the ONTAP syslog data for strings containing "disk", "read", and "error". Look at the chart to see trends in your data and to investigate spikes in read error events, indicating problem areas. Drill down on the chart or values in the table to get to individual events. Examine the error events for severity and the impact on your system.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" disk read error | timechart count by host dispatch.earliest_time = -1h
Count of aggregate events over time by filer
Description
Use this search to proactively monitor your environment for potential problems. The search returns a count of the number of events found that contain the term "aggregate". Aggregate events provide status information about the aggregates. Drill down on the chart or the results table to get more detail information about the event including host, source type, and severity details.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" aggregat* | timechart count by host dispatch.earliest_time = -1h
Count of volume events over time by filer
Description
Use this search to proactively monitor your environment for potential problems. The search returns a count of the number of events found that contain the term "volume". Volume events provide status information about the volumes. Drill down on the chart or the results table to get more detail information about the event including host, source type, and severity details. See the NetApp documentation for a list of volume events.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" volume* | timechart count by host dispatch.earliest_time = -1h
Count of snapshot events on aggregates over time by filer
Description
Use this search to proactively monitor your environment for potential problems. The search returns a count of the number of snapshot events found on aggregates over the time range specified. Drill down on the chart or the results table to get more detail information about the event including host, source type, and severity details. Snapshots require storage space on volumes. You can plan to allocate space for the snapshots by looking at the chart over time to see the trend.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" snapshot* aggregat* | timechart count by host dispatch.earliest_time = -1h
Count of error snapshot events over time by filer
Description
Use this search to proactively monitor your environment for potential problems. The search queries the data for "snapshot" and "error" and returns a count of the number of error snapshot events found per filer. Drill down on the chart or the results table to get more detail information about the event including host, source type, and severity details.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" snapshot error | timechart count by host dispatch.earliest_time = -1h
Count of SnapMirror error events over time by filer
Description
Use this search to proactively monitor your environment for potential problems. The search returns a count of the number of SnapMirror error events found per filer over the time range specified. Drill down on the chart or the results table to get more detail information about the event including host, source type, and severity details.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" snapmirror error | timechart count by host dispatch.earliest_time = -1h
Count of Monitoring and Host Configuration events over time by filer
Description
Use this search to proactively monitor your environment. This search returns a count of the monitoring and host configuration events that appear in your syslog data, per filer, over a one hour timeframe. Compare these events against the expected normal behavior for your environment. An increase in configuration events can indicate that something unexpected is happening to some element of your environment that you need to investigate. Drill down on the chart or the results table to investigate individual events and proactively respond to configuration issues and prevent issues that can lead to a degradation in performance and system unavailability.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" (monitor* OR config*) | timechart count by host dispatch.earliest_time = -1h
Count of Backup and Restore events over time by filer
Description
Use this search to proactively monitor your environment. This search returns a count of the backup and restore events that appear in your syslog data, per filer, over a one hour timeframe. Look for spikes in the data and monitor the trend over time. An increase in events can indicate a problem in your environment that you need to investigate. Drill down on the chart or the results table to investigate individual events and proactively respond to the issue.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" (backup OR restor*) | timechart count by host dispatch.earliest_time = -1h
Count of Optimization and Migration events over time by filer
Description
Use this search to proactively monitor your environment. This search returns a count of the optimization and migration events that appear in your syslog data, per filer, over a one hour timeframe. Look for spikes in the data and monitor the trend over time. Drill down on the chart or the results table to investigate individual events and proactively respond to the issue.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" (optimiz* OR migrat*) | timechart count by host dispatch.earliest_time = -1h
Count of Provisioning and Cloning events over time by filer
Description
Use this search to proactively monitor your environment. This search returns a count of the provisioning and cloning events that appear in your syslog data, per filer, over a one hour timeframe. Look for spikes in the data and monitor the trend over time. Drill down on the chart or the results table to investigate individual events and proactively respond to the issue.
search
sourcetype="ontap:syslog" OR sourcetype="ontap:ems" (provision* OR clon*) | timechart count by host dispatch.earliest_time = -1h
NFS Volumes used by VMware
Description
This search enables the correlation of NetApp ONTAP Data with VMware data. You must have the Splunk App for VMware installed. Run the search to get a table displaying all of the NFS volumes used in your VMware environment. See the topic Correlate NetApp and VMware data for more information about data correlation in this app.
Search
sourcetype="ontap:volume" (source=volume-get-iter OR source=volume-list-info-iter-start) | rename volume-id-attributes.name as name | stats values(name) as volname by host | lookup dnslookup clienthost AS host OUTPUT clientip AS ip | mvexpand volname | table * | join type=inner ip, volname [search `VmwareNFSMounts`] | rename name as "Datastore name", path as "Path", volname as Volume, filer as "Filer (VMware data)", host as Filer, ip as IP, vcenter as VCenter
Aggregates with over 90% capacity used
Description
This report shows all of the Aggregates that have over 90% capacity used, in the last 24 hours.
Search
sourcetype="ontap:aggr" (source="aggr-list-info" OR source="aggr-get-iter") | `CoalesceAggrFields` | search size-percentage-used > 90 | dedup name, host | eval "gb-total"=`BytesToGigaBytes(sz_total)` | eval "gb-free"=`BytesToGigaBytes(sz_free)` | table name, host, volume-count, size-percentage-used, "gb-total", "gb-free"
Disk block transfer rates by Filer and RPM
Description
This report shows the block transfer rates and RPM for all disks associated with a filer.
Search
index=ontap source="diskperfhandler" objname=* | stats avg(total_transfers_rate), avg(user_read_blocks_rate), avg(user_write_blocks_rate) by host, display_name, disk_speed | rename disk_speed AS rpm, display_name AS disk_name
Failed Disks
Description
This report shows the disks that have raid-state as "broken".
Search
sourcetype="ontap:disk" raid-state="broken" | rename physical-space as pspace | eval phys-space-gb=`BytesToGigaBytes(pspace)` | table host, serial-number, name, raid-state, raid-type, disk-type, firmware-revision, rpm, phys-space-gb, aggregate, shelf, bay, pool
Top 10 Busiest Filers - 7 mode and Cluster mode
Description
This report shows the top ten filers with highest total_ops_rate.
Search
sourcetype=ontap:perf source="SystemPerfHandler" | stats first(total_ops_rate) AS total_ops_rate, first(read_ops_rate) AS read_ops_rate, first(write_ops_rate) AS write_ops_rate, first(cpu_busy_percent) AS cpu_busy_percent, by host | sort - total_ops_rate | head 10
Unhealthy cluster nodes in the past hour
Description
This report shows any node that returns a message of "Node is not healthy" in the specified time period.
Search
index=_internal (source="*hydra*" OR source="*splunk_ta_ontap_api*") "Node is not healthy" node=* | table _time,node
Volumes with latency higher than 25ms over 5% of the time
Description
This report shows all Volumes that over 5% of the time have a latency higher than 25ms.
Search
sourcetype=ontap:perf source=VolumePerfHandler objname="*" | eval ismatch=if(latency>25000, 1, 0) | stats count, sum(ismatch) AS matchCount, max(latency) AS maxLatency, avg(latency) AS avgLatency by host, objname | eval percentage=round(100*matchCount/count,0) | eval max_latency = maxLatency/1000 | eval avg_latency = avgLatency/1000 | search percentage > 5 | fields - count, matchCount, maxLatency, avgLatency | rename objname AS volume
Volumes with over 75% capacity used
Description
This report shows all Volumes that have over 75% capacity used, in the last 24 hours. The table displays the name, containing aggregate, percent used, GB-total, GB-used, and Snapshot-percent-reserved.
Search
sourcetype="ontap:volume" (source=volume-get-iter) OR (source=volume-list-info-iter-start) | `CoalesceVolumeFields` | search percentage-used >= 75 | dedup name | eval "gb-total"=`BytesToGigaBytes(sz_total)` | eval "gb-used"=`BytesToGigaBytes(sz_used)` | table host, name, containing-aggregate, percentage-used, "gb-total", "gb-used", snapshot-percent-reserved | sort - percentage-used
Proactive Monitoring dashboards | Settings dashboards |
This documentation applies to the following versions of Splunk® App for NetApp Data ONTAP (Legacy): 2.1.4
Feedback submitted, thanks!