Splunk® Enterprise

Alerting Manual

Download manual as PDF

Splunk Enterprise version 5.0 reached its End of Life on December 1, 2017. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Alert examples

This topic presents a variety of alerting use cases that show you how alerts can help provide operational intelligence around network security, system administration, SLA compliance, and more.

Note: If you haven't read up on how alerting works in Splunk, start with "About alerts," in this manual.

Alerting when an event occurs - "File system full" error

Imagine that you're a system administrator who needs prompt notification when a file system full error occurs on any host--but you don't want to get more than one alert per 10 minute period.

Here's an example of the sort of events that you're on the lookout for:

Jan 27 03:21:00 MMPAD last message repeated 3 times
Jan 27 03:21:00 MMPAD ufs: [ID 845546 kern.notice] NOTICE: alloc: /:file system full
Jan 27 03:21:49 MMPAD ufs: [ID 845546 kern.notice] NOTICE: alloc: /:file system full
Jan 27 03:22:00 MMPAD last message repeated 3 times
Jan 27 03:22:00 MMPAD ufs: [ID 845546 kern.notice] NOTICE: alloc: /:file system full
Jan 27 03:22:56 MMPAD ufs: [ID 845546 kern.notice] NOTICE: alloc: /:file system full

Now, you can easily set up a simple search to find these events:

file system full

It's pretty easy to define a basic conditional alert that sends out an alert when the search returns one or more file system full error events. When you create the alert, just set the Condition to If number of events is greater than 0.

Alert 2 nonzero events.png

However, you also need to determine whether the alert should be scheduled as a scheduled alert or a real-time alert. The deciding factor here is: How quickly do you feel that you need to know about the "file system full" error? Do you want to know the moment the error occurs, or is it ok if you are notified on a "timely" basis (in other words, when the alert is triggered by a search running on a particular schedule).

Using a real-time search to catch the "file system full" error event

If you want to be alerted shortly after the first "file system full" error is received, you'll want to set up a real-time alert with a one minute window (by giving it a Time range of rt-1m to rt). We suggest you use a one minute window rather than a shorter window (such as 30 seconds) for alerts because the real-time search only finds events that occur within the real-time window. A 30 second window might be fine if there is no latency in your system, but all too often search loads and other issues can cause certain events to come in late and be "missed" by the real-time search because its window is too brief.

If you find that a one minute window is too long for your purposes (because the alert has to be suppressed for at least that length of time) you have another option: you can keep the window short, but offset it by a minute. A Time range of rt-90s to rt-60s sets up a one minute buffer that allows all events to come in before they are captured by the 30 second real-time search window.

But remember, you only want to get one alert every 10 minutes, and as you can see from the event listing excerpt above, you can get numerous "file system full" errors in the space of just a few minutes. To deal with that, set Throttling to 10 minutes. This ensures that once the first "file system full" error alert is triggered, successive "file system error" events won't trigger another alert until 10 minutes have passed.

With a 10-minute Throttling setting, you might get an alert email shortly after you start the search, with information on the "file system error" event that triggered it, as well as any other events that were captured within the one minute time window at the time the triggering event came in. But after that first alert is triggered, the 10 minute throttling period kicks in. During the throttling period, subsequent "file system error" events do not trigger the alert. Once the 10 minute throttling period is up, the alert can be triggered again when a new "file system error" event is received. When the alert is triggered, another email goes out, and the throttling period is enforced again for another 10 minutes. And so on. Each time the alert is triggered, the email that goes out only provides information on the one event that triggered it, plus any others that happen to fit within its 30-second time window.

Using a scheduled search to catch the "file system full" error event

On the other hand, if you just want to get timely notification for a "file system full" error event, you can base the alarm on a scheduled search that runs every 10 minutes over the past 10 minutes. Because the search is run on a 10 minute interval, you won't get alerts the moment a "file system full" event is received; instead you'll get an alert after the search runs on its schedule. If an alert is triggered by this scheduled search it returns a list all of the "file system error" events that were received during that 10 minute period (although only one event needs to be received to trigger the alert).

You don't need to set up a 10 minute throttling period for this search because the search already runs on a 10 minute interval. However, if you would prefer there be a longer interval between triggered alert actions, you could throttle subsequent alerts for whatever period over 10 minutes you deem appropriate. For example, you could choose to set a throttling period of 60 minutes, 2 hours, or even 1 day. It's up to you.

Use a custom search to alert when a statistical threshold is reached

In this example, you want to get an alert if Splunk detects an sshd brute force attack. This could be indicated by 5 or more failed ssh attempts within one minute, from the same ip address and targeted against a single user account.

Here's an example of the events that you're on the lookout for:

Jun 26 22:31:04 victim sshd[15384]: Failed password    for jdean from ::ffff:192.0.2.12 port 30937 ssh2
Jun 26 22:31:06 victim sshd[15386]: klargo from ::ffff:198.51.100.3
Jun 26 22:31:06 victim sshd[15386]: error: Could not get shadow information for NOUSER 
Jun 26 22:31:06 victim sshd[15386]: Failed password    for klargo from ::ffff:198.51.100.3 port 30951 ssh2 
Jun 26 22:31:08 victim sshd[15388]: bmac from ::ffff:192.0.2.12 
Jun 26 22:31:08 victim sshd[15388]: error: Could not get shadow information for NOUSER 
Jun 26 22:31:08 victim sshd[15388]: Failed password    for bmac from ::ffff:192.0.2.12 port 30963 ssh2 
Jun 26 22:31:10 victim sshd[15390]: Failed password    for jdean from ::ffff:192.0.2.12 port 30980 ssh2 
Jun 26 22:31:11 victim sshd[15392]: Failed password    for jdean from ::ffff:198.51.100.3 port 30992 ssh2 
Jun 26 22:31:13 victim sshd[15394]: Failed password    for jdean from ::ffff:192.0.2.12 port 31007 ssh2 
Jun 26 22:31:15 victim sshd[15396]: Failed password    for jdean from ::ffff:192.0.2.12 port 31021 ssh2 
Jun 26 22:31:17 victim sshd[15398]: Failed password    for jdean from ::ffff:192.0.2.12 port 31031 ssh2 
Jun 26 22:31:19 victim sshd[15400]: Failed password    for jdean from ::ffff:192.0.2.12 port 31049 ssh2 
Jun 26 22:31:20 victim sshd[15403]: Failed password    for jdean from ::ffff:192.0.2.12 port 31062 ssh2 
Jun 26 22:31:22 victim sshd[15405]: Failed password    for jdean from ::ffff:192.0.2.12 port 31073 ssh2

How do you set up this alert? You might begin by basing it on a fairly simple search:

Failed password sshd

This search finds the failed ssh attempts, and if that was all you were looking for, you could set a basic Condition like greater than 4 to alert you when 5 such events come in within a given time window (if you base the alert on a real-time search) or search interval (if you base the alert on a scheduled search).

But this doesn't help you set up an alert that triggers when 5 or more matching events that come from the same ip address and are targeted against a single user account come in within a one minute window. To do this you need to define an advanced conditional alert that uses a secondary custom search to sort through the events returned by the base search. But before you design that, you should first fix the base search.

Base search - Refine to provide better, more useful results

The base search results are what gets sent in the alert email or rss feed update when the alert is triggered. You should try to make those results as informative as possible.

The following search goes through the "failed password sshd" events that it finds within a given one minute window or interval, groups the events together according to their dest_name (destination name) and src_ip (source IP address) combination, generates a count of each combination, and then arranges this information in an easy-to-read table.

Failed password sshd | stats count by src_ip,dest_name | table src_ip dest_name count

Note: The src_ip and dest_name fields are not among the set of fields that Splunk extracts by default. This example presumes that you have set these field extractions up beforehand.

The table below gives you an idea of the results that this search would return. It shows that four different groups of events were found, each with a specific src_ip/dest_name combination, and it provides a count of the events in each group. Apparently, there were eight failed ssh password attempts from someone at 192.0.2.12 that targeted jdean, all within a one minute timeframe.

src_ip dest_name count
192.0.2.12 bmac 3
192.0.2.12 jdean 8
198.51.100.3 jdean 2
198.51.100.3 klargo 3

Custom conditional search - Alert when more than 5 events with the same destip-srcip combination are found

Then you set up an advanced conditional custom search that analyzes the results of the base search to find results where the count of any dest_name/src_ip combination exceeds 5 in a 60 second timespan.

search count > 5

This custom search would trigger an alert when the results described in the table above come in, due to that set of 8 events.

Alert-examples-scr1.png

Should it be a real-time or scheduled alert?

You'll probably want to set this up as a real-time search. Scheduled searches can cause performance issues when they run over intervals as short as 1 minute. They can also be a bit inaccurate in situations where, for example, 2 qualifying events come in towards the end of one interval and 4 more come in at the start of the subsequent interval.

To set the alert up as a real-time alert, you would give its base search a start time of rt-60s and an end time of rt.

You'll likely also want to set up a throttling period for this search to avoid being overwhelmed by alerts. The setting you choose largely depends on how often you want to get notified about alerts of this nature. For example, you might set it up so that once an alert is triggered, it can't be triggered again for a half hour.

Alerting when a set of IDS solutions report a network attack more than 20 times within a 10 minute period

In this example, you're using a set of intrusion detection systems (IDS), such as Snort, to identify attacks against your network. You want to get an alert when three of these IDS technologies report an attack more than 20 times in 10 minutes. After you get the alert, you don't want another one until at least 10 minutes have passed.

Base search - Get the IDS types with more than 20 attack events in a given 10-minute span of time

To begin with, let's assume that you have set things up so that all of your IDS error events are grouped together by the ids_violation source type and ids_attack event type, and the IDS type is indicated by the idstype field. You can then create a base search for the alert that creates a table that matches each value of idstype found with the actual count of events for each ids type. Finally, it filters this table so that only IDS types with 20 or more events are displayed:

sourcetype=ids_violation eventtype=ids_attack | stats count by idstype | search count >= 20

This base search gives you a table that could look something like this:

idstype count
snort 12
ossec 32
fragrouter 27
BASE 26

You can set this base search up as a real-time search or a scheduled search, but either way it needs to capture data over 10 minute interval or window. The Time Range for the real-time search would be rt-10m to rt. The Time Range for the scheduled search would be -10m@m to now and its interval would be "every 10 minutes."

Custom conditional search - Alert when there are 3 or more IDS types with 20+ events in a given 10-minute span of time

Now that you've set up a a base search that returns only those IDS types that have more than 20 attack events in a 10-minute span, you need to define a conditional search that goes further by insuring that the alert is only triggered when the results of the base search return 20 or more events and include 3 or more IDS types.

To do this, set Condition to if custom condition is met and then enter this conditional search, which looks at the events returned by the base search and triggers an alert if it finds more than 3 IDS types with 20 or more events each:

stats count(idstype) as distinct | search distinct >= 3

This custom search would trigger an alert if it evaluated the results represented by the table above, because 3 IDS types reported more than 20 attacks in the 10-minute timespan represented by the base search.

If you're setting this up as a real-time search, set Throttling to 10 minutes to ensure that you don't get additional alerts until at least 10 minutes have passed since the last one. If you're running this as a scheduled search you don't need to set a throttling value since the search will only run on a 10 minute interval to begin with.

PREVIOUS
Set up alert actions
  NEXT
Review triggered alerts

This documentation applies to the following versions of Splunk® Enterprise: 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters