Splunk® Success Framework

Splunk Success Framework Handbook

Download manual as PDF

Download topic as PDF

Service-level best practices for a Splunk deployment

Service-level definitions are a contract between a service provider and the organization it serves that defines particular aspects of the service, such as quality, availability, and responsibilities. Service-level definitions consist of service-level objectives (SLOs), service-level agreements (SLAs), case priority levels, and incident response times. When Splunk is operated as a service offering, service-level definitions provide all teams and organizations assurance that Splunk operations and response models meet their needs without impacting other areas.

Audience

  • Developer
  • Engineer
  • Search expert
  • Program manager
  • Project manager
  • User community

For more about these roles, see Roles best practices.

Key terms

Mission critical
An outage impacting revenue, ability to hit agreed SLA/OLA, or a noted mission-critical data source or app.
Core service
Indexing, Searching, or Alerting.
Routine change
A low-impact, low-risk change not requiring a change review.
Emergency change
A change required to resolve a P1/P2 condition.
Service request
A request to add new capacity of lower complexity than a project, for example, new inputs or new add-on installs.

Guidelines for implementing service level definitions

There are many factors to consider when making a service-level commitment. The table below lists some guidelines to follow.

Don't over-commit
  • Consider what might be too fast for delivery.
  • Don't make a commitment that's unreasonable or unfair to your team members.
Don't under-commit
  • The requester should never feel that the response time is unreasonably long.
  • Be as accommodating as possible when setting goals for turnaround
Consider the time it takes to gather the necessary information
  • Many types of requests require follow-up with the requester. Recognize that there may be a waiting period for this additional information.
  • Have reasonable expectations when requesting additional information. Make sure to communicate your expectations to the requester.
  • The requester should be aware that a slow response time will have an impact on expected turnaround time.
Think about the process for incoming requests
  • Think about your engagement model for incoming requests. Optimize the request process so teams can work together effectively.
  • Create a process that is straightforward and effective.

SLO templates

SLOs provide expectations for maintenance planning, release planning, and communication with business partners. You can divide SLOs into administrative tasks (day-to-day activities) and implementation tasks.

Use the example provided below or make any updates as necessary.

Administrative SLOs (day-to-day activities) Target
Delete new user 5 business days
Add new user 5 business days
Uplifting user 25 business days
Dashboard creation support 10 business days
Report generation support 5 business days
Alert creation and changes support 5 business days
Create a new Active Directory group for access

(External dependencies)

25 business days
Create new role 15 business days
Implementation SLOs Target
First response to new support request

1 day

Data ingest (standard add-on) 5 days
Data ingest (custom add-on) 10 days
New app install 1 day
Universal forwarder deployment

(Does not include change control SLO)

10 business days
Data source monitor (http, WMI, TCP/UDP) 2 days
Implement new global knowledge object 1 day
Upload data into Splunk

(for example, static log, file, CSV)

5 business days

SLA templates

SLAs are key service definitions for platform availability and incident response.

Use the example provided below or make any updates as necessary.

SLAs Target
Platform availability 99.9% uptime for all core services (< 8.76 hours unplanned downtime per year)
Incident first response Based on priority
Incident status update Based on priority
Restore loss of data feed (ingestion) 1 business day
Restore universal forwarder not reporting (standard) 5 business days
Restore universal forwarder not reporting (mission critical applications) 1 business day

Case priorities

Case priorities are assigned based on the technical importance of the problem. The following case priorities are intended only as examples.

Use the examples provided below, or make any updates as necessary.

Case priority levels

Case priorities may vary by service or source. The following are general guidelines.

Case priority level Definition
P1 A mission critical outage for which there is no workaround. This may be a complete service outage of a core service.
P2 A mission critical outage for which a less than ideal workaround exists. This may be a partial service outage of a core service.
P3 An outage or issue impacting a single user.
P4 Standard service requests or routine changes. For example, access requests, data onboarding, app installation, etc.

Incident response times

Incident response P1 P2 P3 P4
First response 1 hour 2 hours 4 hours 1 business day
Communicated updates Every 2 hours Every 4 hours Every business day Every 5 business days
Resolution time Within 4 hours Within 2 business days Within 3 business days Agreement with the customer
Business hours 24 hours / 7 days per week 8:00am to 5:00pm / 5 days per week

excluding holidays

8:00am to 5:00pm / 5 days per week

excluding holidays

8:00am to 5:00pm / 5 days per week

excluding holidays

PREVIOUS
Sandbox best practices for a Splunk deployment
  NEXT
Showback best practices for a Splunk deployment

This documentation applies to the following versions of Splunk® Success Framework: ssf


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters