Filelog receiver π
The Filelog receiver tails and parses logs from files. The supported pipeline type is logs
. See Process your data with pipelines for more information.
Get started π
Follow these steps to configure and activate the component:
Deploy the Splunk Distribution of the OpenTelemetry Collector to your host or container platform:
Configure the Filelog receiver as described in the next section.
Restart the Collector.
Sample configuration π
To activate the Filelog receiver, add filelog
to the receivers
section of your configuration file:
receivers:
filelog:
To complete the configuration, include the receiver in the logs
pipeline of the service
section of your configuration file:
service:
pipelines:
logs:
receivers: [filelog]
Configuration example π
This example shows how to tail a simple JSON file:
receivers:
filelog:
include: [ /var/log/myservice/*.json ]
operators:
- type: json_parser
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%d %H:%M:%S'
This example shows how to tail a plaintext file:
receivers:
filelog:
include: [ /simple.log ]
operators:
- type: regex_parser
regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%d %H:%M:%S'
severity:
parse_from: attributes.sev
The receiver reads logs from the simple.log file, such as:
2023-06-19 05:20:50 ERROR This is a test error message
2023-06-20 12:50:00 DEBUG This is a test debug message
Use operators to format logs π
The Filelog receiver uses operators to process logs into a desired format. Each operator fulfills a single responsibility, such as reading lines from a file, or parsing JSON from a field. You need to chain operators together in a pipeline to achieve your desired result.
For instance, you can read lines from a file using the file_input
operator. From there, you can send the results of this operation to a regex_parser
operator that creates fields based on a regex pattern. Next, you can send the results to a file_output
operator to write each line to a file on disk.
All operators either create, modify, or consume entries.
An entry is the base representation of log data as it moves through a pipeline.
A field is used to reference values in an entry.
A common expression syntax is used in several operators. For example, expressions can be used to filter or route entries.
Available operators π
For a complete list of available operators, see What operators are available? in GitHub.
The following applies to operators:
Each operator has a
type
.You can give a unique Id to each operator.
If you use the same type of operator more than once in a pipeline, you must specify an Id.
Otherwise, the Id defaults to the value of
type
.
An operator outputs to the next operator in the pipeline.
The last operator in the pipeline emits from the receiver.
Optionally, you can use the output parameter to specify the Id of another operator to pass logs there directly.
Parser operators π
Use parser operators to isolate values from a string. There are two classes of parsers, simple and complex.
Parse header metadata π
To turn on header metadata parsing, set the filelog.allowHeaderMetadataParsing
feature, and set start_at
at the beginning. If set, the file input operator attempts to read a header from the start of the file.
The following applies:
Each header line must match the
header.pattern
pattern.Each line is emitted into a pipeline defined by
header.metadata_operators
.Any attributes on the resultant entry from the embedded pipeline are merged with the attributes from previous lines. If attribute collisions happen, they are resolved with an upsert strategy.
After all header lines are read, the final merged header attributes are present on every log line that is emitted for the file.
The receiver does not emit header lines.
Parsers with embedded operations π
You can configure many parsing operators to embed certain follow-up operations such as timestamp and severity parsing.
For more information, see the the GitHub entry on complex parsers at Parsers .
Multiline configuration π
If set, the multiline configuration block instructs the file_input
operator to split log entries on a pattern other than new lines.
The multiline configuration block must contain line_start_pattern
or line_end_pattern
. These are Regex patterns that match either the beginning of a new log entry, or the end of a log entry.
Supported encodings π
The Filelog receiver supports the following encodings:
Key |
Description |
---|---|
|
No encoding validation. Treats the file as a stream of raw bytes. |
|
UTF-8 encoding. |
|
UTF-16 encoding with little-endian byte order. |
|
UTF-16 encoding with big-endian byte order. |
|
ASCII encoding. |
|
The Big5 Chinese character encoding. |
Other less common encodings are supported on a best-effort basis. See the list of available encodings in https://www.iana.org/assignments/character-sets/character-sets.xhtml .
Advanced use cases π
See a few use cases for the Filelog receiver in the following sections.
You can find more examples in the GitHub repository splunk-otel-collextor/examples .
Send logs to Splunk Cloud π
Use the following configuration to send logs to Splunk Cloud:
receivers:
filelog:
include: [ /output/file.log ]
operators:
- type: regex_parser
regex: '(?P<before>.*)\d\d\d\d-\d\d\d-\d\d\d\d(?P<after>.*)'
parse_to: body.parsed
output: before_and_after
- id: before_and_after
type: add
field: body
value: EXPR(body.parsed.before + "XXX-XXX-XXXX" + body.parsed.after)
exporters:
# Logs
splunk_hec:
token: "${SPLUNK_HEC_TOKEN}"
endpoint: "${SPLUNK_HEC_URL}"
source: "otel"
sourcetype: "otel"
service:
pipelines:
logs:
receivers: [filelog, otlp]
processors:
- memory_limiter
- batch
- resourcedetection
#- resource/add_environment
exporters: [splunk_hec]
Send truncated logs to Splunk Enterprise π
Use the following configuration to truncate logs and send them to Splunk Enterprise:
receivers:
filelog:
include: [ /output/file.log ]
operators:
- type: regex_parser
regex: '(?P<before>.*)\d\d\d\d-\d\d\d-\d\d\d\d(?P<after>.*)'
parse_to: body.parsed
output: before_and_after
- id: before_and_after
type: add
field: body
value: EXPR(body.parsed.before + "XXX-XXX-XXXX" + body.parsed.after)
exporters:
splunk_hec/logs:
# Splunk HTTP Event Collector token.
token: "00000000-0000-0000-0000-0000000000000"
# URL to a Splunk instance to send data to.
endpoint: "https://splunk:8088/services/collector"
# Optional Splunk source: https://docs.splunk.com/Splexicon:Source
source: "output"
# Splunk index, optional name of the Splunk index targeted.
index: "logs"
# Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
max_connections: 20
# Whether to disable gzip compression over HTTP. Defaults to false.
disable_compression: false
# HTTP timeout when sending data. Defaults to 10s.
timeout: 10s
# Whether to skip checking the certificate of the HEC endpoint when sending data over HTTPS. Defaults to false.
# For this demo, we use a self-signed certificate on the Splunk docker instance, so this flag is set to true.
tls:
insecure_skip_verify: true
processors:
batch:
transform:
log_statements:
- context: log
statements:
- set(body, Substring(body,0, 10))
extensions:
health_check:
endpoint: 0.0.0.0:13133
pprof:
endpoint: :1888
zpages:
endpoint: :55679
service:
extensions: [ pprof, zpages, health_check ]
pipelines:
logs:
receivers: [ filelog ]
processors: [ batch, transform ]
exporters: [ splunk_hec/logs ]
Send sanitized logs to Splunk Enterprise π
Use the following configuration to sanitize logs and send them to Splunk Enterprise.
receivers:
filelog:
include: [ /output/file.log ]
operators:
- type: regex_parser
regex: '(?P<before>.*)\d\d\d\d-\d\d\d-\d\d\d\d(?P<after>.*)'
parse_to: body.parsed
output: before_and_after
- id: before_and_after
type: add
field: body
value: EXPR(body.parsed.before + "XXX-XXX-XXXX" + body.parsed.after)
exporters:
splunk_hec/logs:
# Splunk HTTP Event Collector token.
token: "00000000-0000-0000-0000-0000000000000"
# URL to a Splunk instance to send data to.
endpoint: "https://splunk:8088/services/collector"
# Optional Splunk source: https://docs.splunk.com/Splexicon:Source
source: "output"
# Splunk index, optional name of the Splunk index targeted.
index: "logs"
# Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
max_connections: 20
# Whether to disable gzip compression over HTTP. Defaults to false.
disable_compression: false
# HTTP timeout when sending data. Defaults to 10s.
timeout: 10s
# Whether to skip checking the certificate of the HEC endpoint when sending data over HTTPS. Defaults to false.
# For this demo, we use a self-signed certificate on the Splunk docker instance, so this flag is set to true.
insecure_skip_verify: true
processors:
batch:
extensions:
health_check:
endpoint: 0.0.0.0:13133
pprof:
endpoint: :1888
zpages:
endpoint: :55679
service:
extensions: [pprof, zpages, health_check]
pipelines:
logs:
receivers: [filelog]
processors: [batch]
exporters: [splunk_hec/logs]
Route logs to different indexes π
Use the following configuration to route logs to different Splunk indexes.
receivers:
filelog:
include: [ /output/file*.log ]
start_at: beginning
operators:
- type: regex_parser
regex: '(?P<logindex>log\d?)'
exporters:
splunk_hec/logs:
# Splunk HTTP Event Collector token.
token: "00000000-0000-0000-0000-0000000000000"
# URL to a Splunk instance to send data to.
endpoint: "https://splunk:8088/services/collector"
# Optional Splunk source: https://docs.splunk.com/Splexicon:Source
source: "output"
# Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
max_connections: 20
# Whether to disable gzip compression over HTTP. Defaults to false.
disable_compression: false
# HTTP timeout when sending data. Defaults to 10s.
timeout: 10s
tls:
# Whether to skip checking the certificate of the HEC endpoint when sending data over HTTPS. Defaults to false.
# For this demo, we use a self-signed certificate on the Splunk docker instance, so this flag is set to true.
insecure_skip_verify: true
processors:
batch:
attributes/log:
include:
match_type: strict
attributes:
- { key: logindex, value: 'log' }
actions:
- key: com.splunk.index
action: upsert
value: "logs"
- key: logindex
action: delete
attributes/log2:
include:
match_type: strict
attributes:
- { key: logindex, value: 'log2' }
actions:
- key: com.splunk.index
action: upsert
value: "logs2"
- key: logindex
action: delete
attributes/log3:
include:
match_type: strict
attributes:
- { key: logindex, value: 'log3' }
actions:
- key: com.splunk.index
action: upsert
value: "logs3"
- key: logindex
action: delete
extensions:
health_check:
endpoint: 0.0.0.0:13133
pprof:
endpoint: :1888
zpages:
endpoint: :55679
service:
extensions: [pprof, zpages, health_check]
pipelines:
logs:
receivers: [filelog]
processors: [batch, attributes/log, attributes/log2, attributes/log3]
exporters: [splunk_hec/logs]
Associate log sources with source types π
This example showcases how the Collector collects data from files and sends it to Splunk Enterprise, associating each source with a different source type. The source type is a default field that identifies the structure of an event, and determines how Splunk Enterprise formats the data during the indexing process.
processors:
batch:
resource/one:
attributes:
# Set the com.splunk.sourcetype log attribute key to sourcetype1.
# com.splunk.sourcetype is the default key the HEC exporter will use to extract the source type of the record.
# See https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/splunkhecexporter
# under the configuration key `hec_metadata_to_otel_attrs/sourcetype`
- key: com.splunk.sourcetype
value: "sourcetype1"
action: upsert
resource/two:
attributes:
- key: com.splunk.sourcetype
value: "sourcetype2"
action: upsert
resource/three:
attributes:
- key: com.splunk.sourcetype
value: "sourcetype3"
action: upsert
receivers:
filelog/onefile:
include: [ /output/file.log ]
filelog/twofile:
include: [ /output/file2.log ]
filelog/threefolder:
include: [ /output3/*.log ]
exporters:
splunk_hec/logs:
# Splunk HTTP Event Collector token.
token: "00000000-0000-0000-0000-0000000000000"
# URL to a Splunk instance to send data to.
endpoint: "https://splunk:8088/services/collector"
# Optional Splunk source: https://docs.splunk.com/Splexicon:Source
source: "output"
# Splunk index, optional name of the Splunk index targeted.
index: "logs"
# Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
max_connections: 20
# Whether to disable gzip compression over HTTP. Defaults to false.
disable_compression: false
# HTTP timeout when sending data. Defaults to 10s.
timeout: 10s
tls:
# Whether to skip checking the certificate of the HEC endpoint when sending data over HTTPS. Defaults to false.
# For this demo, we use a self-signed certificate on the Splunk docker instance, so this flag is set to true.
insecure_skip_verify: true
extensions:
health_check:
endpoint: 0.0.0.0:13133
pprof:
endpoint: :1888
zpages:
endpoint: :55679
service:
extensions: [pprof, zpages, health_check]
pipelines:
logs/one:
receivers: [ filelog/onefile ]
processors: [ batch, resource/one ]
exporters: [ splunk_hec/logs ]
logs/two:
receivers: [ filelog/twofile ]
processors: [ batch, resource/two ]
exporters: [ splunk_hec/logs ]
logs/three:
receivers: [ filelog/threefolder ]
processors: [ batch, resource/three ]
exporters: [ splunk_hec/logs ]
Settings π
Note
By default, the receiver doesnβt read logs from a file that is not actively being written to because start_at
defaults to end
.
The following table shows the configuration options for the Filelog receiver:
Troubleshooting π
If you are a Splunk Observability Cloud customer and are not able to see your data in Splunk Observability Cloud, you can get help in the following ways.
Available to Splunk Observability Cloud customers
Submit a case in the Splunk Support Portal .
Contact Splunk Support .
Available to prospective customers and free trial users
Ask a question and get answers through community support at Splunk Answers .
Join the Splunk #observability user group Slack channel to communicate with customers, partners, and Splunk employees worldwide. To join, see Chat groups in the Get Started with Splunk Community manual.