Manage data integrity

Splunk Enterprise data integrity control helps you verify the integrity of data that it indexes.

When you enable data integrity control for an index, Splunk Enterprise computes hashes on every slice of data using the SHA-256 algorithm. It then stores those hashes so that you can verify the integrity of your data later.

Splunk Enterprise supports data integrity control on local indexes only. There is no support on SmartStore indexes.

Data integrity control works only on Splunk Enterprise. Splunk Cloud Platform does not use data integrity control.

How data verification works

When you enable data integrity control, Splunk Enterprise computes hashes on every slice of newly indexed raw data and writes it to an l1Hashes file. When the bucket rolls from hot to warm, Splunk Enterprise computes a hash on the contents of the l1Hashes and stores the computed hash in l2Hash. Splunk Enterprise stores both hash files in the rawdata directory for that bucket.

Data integrity control generates hashes on newly indexed data. To ensure that data that comes from a. forwarder is secure, encrypt that data using SSL. For more information, see About securing Splunk with SSL.

Check data verification hashes to validate data

To check Splunk Enterprise data, run the following CLI command to verify the integrity of an index or bucket:

./splunk check-integrity -bucketPath [ bucket path ] [ -verbose ]

./splunk check-integrity -index [ index name ] [ -verbose ]

Configure data integrity control

To configure Data Integrity Control, edit the indexes.conf configuration file. For each index that you want data integrity to enable the enableDataIntegrityControl setting for each index. The default value for this setting for all indexes is false (off).

enableDataIntegrityControl=true

Data Integrity in clustered environments

In a clustered environment, the cluster manager and all the peers must run Splunk Enterprise 6.3 or higher to enable accurate index replication.

Optionally modify the size of your data slice

By default, data slice sizes are set to 128KB, which means that a data slice is created and hashed every 128KB. You can optionally edit the indexes.conf configuration file to specify the size of each slice.

rawChunkSizeBytes = 131072

Store and secure data hashes

For optimal security, you can optionally store data integrity hashes outside the instance that hosts your data, such as on a different server. To avoid naming conflicts, store secured hashes in separate directories.

Regenerate hashes

If you lose data hashes for a bucket, use the following CLI command to regenerate the files on a bucket or index. This command extracts the hashes that exist in the journal:

./splunk generate-hash-files -bucketPath [ bucket path ]  [ -verbose ]

./splunk generate-hash-files -index [ index name ] [ -verbose ]

Related answers from Splunk Community

Manage data integrity

How data verification works

Check data verification hashes to validate data

Configure data integrity control

Data Integrity in clustered environments

Optionally modify the size of your data slice

Store and secure data hashes

Regenerate hashes

Comments

Manage data integrity

Was this topic useful?