Transparent huge memory pages and Splunk performance
Some distributions of Linux (for example, Red Hat, CentOS, and Ubuntu) have an advanced memory management scheme called Transparent Huge Pages (THP). THP acts as an abstraction layer that lets the memory management units (MMUs) in a machine work with huge memory pages in the operating system. With THP, this work occurs without specific action on the behalf of the administrator or the software that runs on the machine.
Every CPU in a modern computer has an MMU. The MMU manages memory in pages, and huge pages are structures that let MMUs manage multiple gigabytes and terabytes of memory more efficiently.
THP has been associated with degradation of Splunk Enterprise performance in at least some Linux kernel versions. When THP is turned on, it can significantly degrade overall machine performance on systems that run Splunk Enterprise because of several issues:
- The implementation is too aggressive at coalescing memory pages for short-lived processes (such as many Splunk searches)
- It can prevent the
jemalloc
memory allocation implementation from releasing memory back to the operating system after use. Thejemalloc
implementation is more scalable version of themalloc
implementation and is used in newer distributions of Linux. - For some workloads, it can cause I/O regressions surrounding swapping of huge pages.
Splunk has observed a minimum of a 30% degradation in indexing and search performance on Linux systems where THP is active, with a similar percentage increase in latency. Where possible, turn off THP on your Linux system configuration for all machines that run Splunk software, unless that machine also runs an application that requires THP.
Some Linux administrators oftentimes disable THP but leave direct memory compaction, also known as defrag, turned on, or vice versa. Either combination of having THP or direct memory compaction turned on can severely degrade performance in Splunk Enterprise. Where practical, do not turn on any aspect of THP on a Linux machine that runs Splunk Enterprise.
Performance Monitor inputs show maximum values of 100 percent usage for a process on multicore Microsoft Windows machines | Linux kernel memory overcommitting and Splunk crashes |
This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.0.9, 9.0.10, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.1.4, 9.1.5, 9.1.6, 9.1.7, 9.2.0, 9.2.1, 9.2.2, 9.2.3, 9.2.4, 9.3.0, 9.3.1, 9.3.2, 9.4.0
Feedback submitted, thanks!