This topic presents a variety of alerting use cases that show you how alerts can help provide operational intelligence around network security, system administration, SLA compliance, and more.
Note: If you haven't read up on how alerting works in Splunk, start with "About alerts," in this manual.
Alerting when an event occurs - "File system full" error
Imagine that you're a system administrator who needs prompt notification when a
file system full error occurs on any host--but you don't want to get more than one alert per 10 minute period.
Here's an example of the sort of events that you're on the lookout for:
Jan 27 03:21:00 MMPAD last message repeated 3 times Jan 27 03:21:00 MMPAD ufs: [ID 845546 kern.notice] NOTICE: alloc: /:file system full Jan 27 03:21:49 MMPAD ufs: [ID 845546 kern.notice] NOTICE: alloc: /:file system full Jan 27 03:22:00 MMPAD last message repeated 3 times Jan 27 03:22:00 MMPAD ufs: [ID 845546 kern.notice] NOTICE: alloc: /:file system full Jan 27 03:22:56 MMPAD ufs: [ID 845546 kern.notice] NOTICE: alloc: /:file system full
Now, you can easily set up a simple search to find these events:
file system full
It's pretty easy to define a basic conditional alert that sends out an alert when the search returns one or more file system full error events. When you create the alert, just set the Condition to If number of events is greater than 0.
However, you also need to determine whether the alert should be scheduled as a scheduled alert or a real-time alert. The deciding factor here is: How quickly do you feel that you need to know about the "file system full" error? Do you want to know the moment the error occurs, or is it ok if you are notified on a "timely" basis (in other words, when the alert is triggered by a search running on a particular schedule).
Using a real-time search to catch the "file system full" error event
If you want to be alerted shortly after the first "file system full" error is received, you'll want to set up a real-time alert with a one minute window (by giving it a Time range of rt-1m to rt). We suggest you use a one minute window rather than a shorter window (such as 30 seconds) for alerts because the real-time search only finds events that occur within the real-time window. A 30 second window might be fine if there is no latency in your system, but all too often search loads and other issues can cause certain events to come in late and be "missed" by the real-time search because its window is too brief.
If you find that a one minute window is too long for your purposes (because the alert has to be suppressed for at least that length of time) you have another option: you can keep the window short, but offset it by a minute. A Time range of rt-90s to rt-60s sets up a one minute buffer that allows all events to come in before they are captured by the 30 second real-time search window.
But remember, you only want to get one alert every 10 minutes, and as you can see from the event listing excerpt above, you can get numerous "file system full" errors in the space of just a few minutes. To deal with that, set Throttling to 10 minutes. This ensures that once the first "file system full" error alert is triggered, successive "file system error" events won't trigger another alert until 10 minutes have passed.
With a 10-minute Throttling setting, you might get an alert email shortly after you start the search, with information on the "file system error" event that triggered it, as well as any other events that were captured within the one minute time window at the time the triggering event came in. But after that first alert is triggered, the 10 minute throttling period kicks in. During the throttling period, subsequent "file system error" events do not trigger the alert. Once the 10 minute throttling period is up, the alert can be triggered again when a new "file system error" event is received. When the alert is triggered, another email goes out, and the throttling period is enforced again for another 10 minutes. And so on. Each time the alert is triggered, the email that goes out only provides information on the one event that triggered it, plus any others that happen to fit within its 30-second time window.
Using a scheduled search to catch the "file system full" error event
On the other hand, if you just want to get timely notification for a "file system full" error event, you can base the alarm on a scheduled search that runs every 10 minutes over the past 10 minutes. Because the search is run on a 10 minute interval, you won't get alerts the moment a "file system full" event is received; instead you'll get an alert after the search runs on its schedule. If an alert is triggered by this scheduled search it returns a list all of the "file system error" events that were received during that 10 minute period (although only one event needs to be received to trigger the alert).
You don't need to set up a 10 minute throttling period for this search because the search already runs on a 10 minute interval. However, if you would prefer there be a longer interval between triggered alert actions, you could throttle subsequent alerts for whatever period over 10 minutes you deem appropriate. For example, you could choose to set a throttling period of 60 minutes, 2 hours, or even 1 day. It's up to you.
Use a custom search to alert when a statistical threshold is reached
In this example, you want to get an alert if Splunk detects an sshd brute force attack. This could be indicated by 5 or more failed ssh attempts within one minute, from the same ip address and targeted against a single user account.
Here's an example of the events that you're on the lookout for:
Jun 26 22:31:04 victim sshd: Failed password for jdean from ::ffff:192.0.2.12 port 30937 ssh2 Jun 26 22:31:06 victim sshd: klargo from ::ffff:198.51.100.3 Jun 26 22:31:06 victim sshd: error: Could not get shadow information for NOUSER Jun 26 22:31:06 victim sshd: Failed password for klargo from ::ffff:198.51.100.3 port 30951 ssh2 Jun 26 22:31:08 victim sshd: bmac from ::ffff:192.0.2.12 Jun 26 22:31:08 victim sshd: error: Could not get shadow information for NOUSER Jun 26 22:31:08 victim sshd: Failed password for bmac from ::ffff:192.0.2.12 port 30963 ssh2 Jun 26 22:31:10 victim sshd: Failed password for jdean from ::ffff:192.0.2.12 port 30980 ssh2 Jun 26 22:31:11 victim sshd: Failed password for jdean from ::ffff:198.51.100.3 port 30992 ssh2 Jun 26 22:31:13 victim sshd: Failed password for jdean from ::ffff:192.0.2.12 port 31007 ssh2 Jun 26 22:31:15 victim sshd: Failed password for jdean from ::ffff:192.0.2.12 port 31021 ssh2 Jun 26 22:31:17 victim sshd: Failed password for jdean from ::ffff:192.0.2.12 port 31031 ssh2 Jun 26 22:31:19 victim sshd: Failed password for jdean from ::ffff:192.0.2.12 port 31049 ssh2 Jun 26 22:31:20 victim sshd: Failed password for jdean from ::ffff:192.0.2.12 port 31062 ssh2 Jun 26 22:31:22 victim sshd: Failed password for jdean from ::ffff:192.0.2.12 port 31073 ssh2
How do you set up this alert? You might begin by basing it on a fairly simple search:
Failed password sshd
This search finds the failed ssh attempts, and if that was all you were looking for, you could set a basic Condition like greater than 4 to alert you when 5 such events come in within a given time window (if you base the alert on a real-time search) or search interval (if you base the alert on a scheduled search).
But this doesn't help you set up an alert that triggers when 5 or more matching events that come from the same ip address and are targeted against a single user account come in within a one minute window. To do this you need to define an advanced conditional alert that uses a secondary custom search to sort through the events returned by the base search. But before you design that, you should first fix the base search.
Base search - Refine to provide better, more useful results
The base search results are what gets sent in the alert email or rss feed update when the alert is triggered. You should try to make those results as informative as possible.
The following search goes through the "failed password sshd" events that it finds within a given one minute window or interval, groups the events together according to their
dest_name (destination name) and
src_ip (source IP address) combination, generates a count of each combination, and then arranges this information in an easy-to-read table.
Failed password sshd | stats count by src_ip,dest_name | table src_ip dest_name count
dest_name fields are not among the set of fields that Splunk extracts by default. This example presumes that you have set these field extractions up beforehand.
The table below gives you an idea of the results that this search would return. It shows that four different groups of events were found, each with a specific
dest_name combination, and it provides a count of the events in each group. Apparently, there were eight failed ssh password attempts from someone at
192.0.2.12 that targeted
jdean, all within a one minute timeframe.
Custom conditional search - Alert when more than 5 events with the same destip-srcip combination are found
Then you set up an advanced conditional custom search that analyzes the results of the base search to find results where the count of any
src_ip combination exceeds 5 in a 60 second timespan.
search count > 5
This custom search would trigger an alert when the results described in the table above come in, due to that set of 8 events.
Should it be a real-time or scheduled alert?
You'll probably want to set this up as a real-time search. Scheduled searches can cause performance issues when they run over intervals as short as 1 minute. They can also be a bit inaccurate in situations where, for example, 2 qualifying events come in towards the end of one interval and 4 more come in at the start of the subsequent interval.
To set the alert up as a real-time alert, you would give its base search a start time of rt-60s and an end time of rt.
You'll likely also want to set up a throttling period for this search to avoid being overwhelmed by alerts. The setting you choose largely depends on how often you want to get notified about alerts of this nature. For example, you might set it up so that once an alert is triggered, it can't be triggered again for a half hour.
Alerting when a set of IDS solutions report a network attack more than 20 times within a 10 minute period
In this example, you're using a set of intrusion detection systems (IDS), such as Snort, to identify attacks against your network. You want to get an alert when three of these IDS technologies report an attack more than 20 times in 10 minutes. After you get the alert, you don't want another one until at least 10 minutes have passed.
Base search - Get the IDS types with more than 20 attack events in a given 10-minute span of time
To begin with, let's assume that you have set things up so that all of your IDS error events are grouped together by the
ids_violation source type and
ids_attack event type, and the IDS type is indicated by the
idstype field. You can then create a base search for the alert that creates a table that matches each value of
idstype found with the actual count of events for each ids type. Finally, it filters this table so that only IDS types with 20 or more events are displayed:
sourcetype=ids_violation eventtype=ids_attack | stats count by idstype | search count >= 20
This base search gives you a table that could look something like this:
You can set this base search up as a real-time search or a scheduled search, but either way it needs to capture data over 10 minute interval or window. The Time Range for the real-time search would be rt-10m to rt. The Time Range for the scheduled search would be -10m@m to now and its interval would be "every 10 minutes."
Custom conditional search - Alert when there are 3 or more IDS types with 20+ events in a given 10-minute span of time
Now that you've set up a a base search that returns only those IDS types that have more than 20 attack events in a 10-minute span, you need to define a conditional search that goes further by insuring that the alert is only triggered when the results of the base search return 20 or more events and include 3 or more IDS types.
To do this, set Condition to if custom condition is met and then enter this conditional search, which looks at the events returned by the base search and triggers an alert if it finds more than 3 IDS types with 20 or more events each:
stats count(idstype) as distinct | search distinct >= 3
This custom search would trigger an alert if it evaluated the results represented by the table above, because 3 IDS types reported more than 20 attacks in the 10-minute timespan represented by the base search.
If you're setting this up as a real-time search, set Throttling to 10 minutes to ensure that you don't get additional alerts until at least 10 minutes have passed since the last one. If you're running this as a scheduled search you don't need to set a throttling value since the search will only run on a 10 minute interval to begin with.
Set up alert actions
Review triggered alerts
This documentation applies to the following versions of Splunk® Enterprise: 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18