Docs » Supported integrations in Splunk Observability Cloud » Collector components: Receivers » Filelog receiver

Filelog receiver πŸ”—

The Filelog receiver tails and parses logs from files. The supported pipeline type is logs. See Process your data with pipelines for more information.

Get started πŸ”—

Follow these steps to configure and activate the component:

  1. Deploy the Splunk Distribution of the OpenTelemetry Collector to your host or container platform:

  1. Configure the Filelog receiver as described in the next section.

  2. Restart the Collector.

Sample configuration πŸ”—

To activate the Filelog receiver, add filelog to the receivers section of your configuration file:

receivers:
  filelog:

To complete the configuration, include the receiver in the logs pipeline of the service section of your configuration file:

service:
  pipelines:
    logs:
      receivers: [filelog]

Configuration example πŸ”—

This example shows how to tail a simple JSON file:

receivers:
  filelog:
    include: [ /var/log/myservice/*.json ]
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%d %H:%M:%S'

This example shows how to tail a plaintext file:

receivers:
  filelog:
    include: [ /simple.log ]
    operators:
      - type: regex_parser
        regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%d %H:%M:%S'
        severity:
          parse_from: attributes.sev

The receiver reads logs from the simple.log file, such as:

2023-06-19 05:20:50 ERROR This is a test error message

2023-06-20 12:50:00 DEBUG This is a test debug message

Use operators to format logs πŸ”—

The Filelog receiver uses operators to process logs into a desired format. Each operator fulfills a single responsibility, such as reading lines from a file, or parsing JSON from a field. You need to chain operators together in a pipeline to achieve your desired result.

For instance, you can read lines from a file using the file_input operator. From there, you can send the results of this operation to a regex_parser operator that creates fields based on a regex pattern. Next, you can send the results to a file_output operator to write each line to a file on disk.

All operators either create, modify, or consume entries.

  • An entry is the base representation of log data as it moves through a pipeline.

  • A field is used to reference values in an entry.

  • A common expression syntax is used in several operators. For example, expressions can be used to filter or route entries.

Available operators πŸ”—

For a complete list of available operators, see What operators are available? in GitHub.

The following applies to operators:

  • Each operator has a type.

  • You can give a unique Id to each operator.

    • If you use the same type of operator more than once in a pipeline, you must specify an Id.

    • Otherwise, the Id defaults to the value of type.

  • An operator outputs to the next operator in the pipeline.

    • The last operator in the pipeline emits from the receiver.

    • Optionally, you can use the output parameter to specify the Id of another operator to pass logs there directly.

Parser operators πŸ”—

Use parser operators to isolate values from a string. There are two classes of parsers, simple and complex.

Parse header metadata πŸ”—

To turn on header metadata parsing, set the filelog.allowHeaderMetadataParsing feature, and set start_at at the beginning. If set, the file input operator attempts to read a header from the start of the file.

The following applies:

  • Each header line must match the header.pattern pattern.

  • Each line is emitted into a pipeline defined by header.metadata_operators.

  • Any attributes on the resultant entry from the embedded pipeline are merged with the attributes from previous lines. If attribute collisions happen, they are resolved with an upsert strategy.

  • After all header lines are read, the final merged header attributes are present on every log line that is emitted for the file.

The receiver does not emit header lines.

Parsers with embedded operations πŸ”—

You can configure many parsing operators to embed certain follow-up operations such as timestamp and severity parsing.

For more information, see the the GitHub entry on complex parsers at Parsers .

Multiline configuration πŸ”—

If set, the multiline configuration block instructs the file_input operator to split log entries on a pattern other than new lines.

The multiline configuration block must contain line_start_pattern or line_end_pattern. These are Regex patterns that match either the beginning of a new log entry, or the end of a log entry.

Supported encodings πŸ”—

The Filelog receiver supports the following encodings:

Key

Description

nop

No encoding validation. Treats the file as a stream of raw bytes.

utf-8

UTF-8 encoding.

utf-16le

UTF-16 encoding with little-endian byte order.

utf-16be

UTF-16 encoding with big-endian byte order.

ascii

ASCII encoding.

big5

The Big5 Chinese character encoding.

Other less common encodings are supported on a best-effort basis. See the list of available encodings in https://www.iana.org/assignments/character-sets/character-sets.xhtml .

Advanced use cases πŸ”—

See a few use cases for the Filelog receiver in the following sections.

You can find more examples in the GitHub repository splunk-otel-collextor/examples .

Send logs to Splunk Cloud πŸ”—

Use the following configuration to send logs to Splunk Cloud:

receivers:
  filelog:
    include: [ /output/file.log ]
    operators:
      - type: regex_parser
        regex: '(?P<before>.*)\d\d\d\d-\d\d\d-\d\d\d\d(?P<after>.*)'
        parse_to: body.parsed
        output: before_and_after
      - id: before_and_after
        type: add
        field: body
        value: EXPR(body.parsed.before + "XXX-XXX-XXXX" + body.parsed.after)

exporters:
  # Logs
  splunk_hec:
    token: "${SPLUNK_HEC_TOKEN}"
    endpoint: "${SPLUNK_HEC_URL}"
    source: "otel"
    sourcetype: "otel"

service:
  pipelines:
    logs:
      receivers: [filelog, otlp]
      processors:
      - memory_limiter
      - batch
      - resourcedetection
      #- resource/add_environment
      exporters: [splunk_hec]

Send truncated logs to Splunk Enterprise πŸ”—

Use the following configuration to truncate logs and send them to Splunk Enterprise:

receivers:
  filelog:
    include: [ /output/file.log ]
    operators:
      - type: regex_parser
        regex: '(?P<before>.*)\d\d\d\d-\d\d\d-\d\d\d\d(?P<after>.*)'
        parse_to: body.parsed
        output: before_and_after
      - id: before_and_after
        type: add
        field: body
        value: EXPR(body.parsed.before + "XXX-XXX-XXXX" + body.parsed.after)


exporters:
  splunk_hec/logs:
    # Splunk HTTP Event Collector token.
    token: "00000000-0000-0000-0000-0000000000000"
    # URL to a Splunk instance to send data to.
    endpoint: "https://splunk:8088/services/collector"
    # Optional Splunk source: https://docs.splunk.com/Splexicon:Source
    source: "output"
    # Splunk index, optional name of the Splunk index targeted.
    index: "logs"
    # Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
    max_connections: 20
    # Whether to disable gzip compression over HTTP. Defaults to false.
    disable_compression: false
    # HTTP timeout when sending data. Defaults to 10s.
    timeout: 10s
    # Whether to skip checking the certificate of the HEC endpoint when sending data over HTTPS. Defaults to false.
    # For this demo, we use a self-signed certificate on the Splunk docker instance, so this flag is set to true.
    tls:
      insecure_skip_verify: true

processors:
  batch:
  transform:
    log_statements:
      - context: log
        statements:
          - set(body, Substring(body,0, 10))

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: :1888
  zpages:
    endpoint: :55679

service:
  extensions: [ pprof, zpages, health_check ]
  pipelines:
    logs:
      receivers: [ filelog ]
      processors: [ batch, transform ]
      exporters: [ splunk_hec/logs ]

Send sanitized logs to Splunk Enterprise πŸ”—

Use the following configuration to sanitize logs and send them to Splunk Enterprise.

receivers:
    filelog:
      include: [ /output/file.log ]
      operators:
        - type: regex_parser
          regex: '(?P<before>.*)\d\d\d\d-\d\d\d-\d\d\d\d(?P<after>.*)'
          parse_to: body.parsed
          output: before_and_after
        - id: before_and_after
          type: add
          field: body
          value: EXPR(body.parsed.before + "XXX-XXX-XXXX" + body.parsed.after)


exporters:
    splunk_hec/logs:
        # Splunk HTTP Event Collector token.
        token: "00000000-0000-0000-0000-0000000000000"
        # URL to a Splunk instance to send data to.
        endpoint: "https://splunk:8088/services/collector"
        # Optional Splunk source: https://docs.splunk.com/Splexicon:Source
        source: "output"
        # Splunk index, optional name of the Splunk index targeted.
        index: "logs"
        # Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
        max_connections: 20
        # Whether to disable gzip compression over HTTP. Defaults to false.
        disable_compression: false
        # HTTP timeout when sending data. Defaults to 10s.
        timeout: 10s
        # Whether to skip checking the certificate of the HEC endpoint when sending data over HTTPS. Defaults to false.
        # For this demo, we use a self-signed certificate on the Splunk docker instance, so this flag is set to true.
        insecure_skip_verify: true

processors:
    batch:

extensions:
    health_check:
      endpoint: 0.0.0.0:13133
    pprof:
      endpoint: :1888
    zpages:
      endpoint: :55679

service:
    extensions: [pprof, zpages, health_check]
    pipelines:
      logs:
        receivers: [filelog]
        processors: [batch]
        exporters: [splunk_hec/logs]

Route logs to different indexes πŸ”—

Use the following configuration to route logs to different Splunk indexes.

receivers:
    filelog:
      include: [ /output/file*.log ]
      start_at: beginning
      operators:
        - type: regex_parser
          regex: '(?P<logindex>log\d?)'

exporters:
    splunk_hec/logs:
        # Splunk HTTP Event Collector token.
        token: "00000000-0000-0000-0000-0000000000000"
        # URL to a Splunk instance to send data to.
        endpoint: "https://splunk:8088/services/collector"
        # Optional Splunk source: https://docs.splunk.com/Splexicon:Source
        source: "output"
        # Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
        max_connections: 20
        # Whether to disable gzip compression over HTTP. Defaults to false.
        disable_compression: false
        # HTTP timeout when sending data. Defaults to 10s.
        timeout: 10s
        tls:
          # Whether to skip checking the certificate of the HEC endpoint when sending data over HTTPS. Defaults to false.
          # For this demo, we use a self-signed certificate on the Splunk docker instance, so this flag is set to true.
          insecure_skip_verify: true
processors:
    batch:
    attributes/log:
      include:
        match_type: strict
        attributes:
          - { key: logindex, value: 'log' }
      actions:
        - key: com.splunk.index
          action: upsert
          value: "logs"
        - key: logindex
          action: delete
    attributes/log2:
      include:
        match_type: strict
        attributes:
          - { key: logindex, value: 'log2' }
      actions:
        - key: com.splunk.index
          action: upsert
          value: "logs2"
        - key: logindex
          action: delete
    attributes/log3:
      include:
        match_type: strict
        attributes:
          - { key: logindex, value: 'log3' }
      actions:
        - key: com.splunk.index
          action: upsert
          value: "logs3"
        - key: logindex
          action: delete

extensions:
    health_check:
      endpoint: 0.0.0.0:13133
    pprof:
      endpoint: :1888
    zpages:
      endpoint: :55679

service:
    extensions: [pprof, zpages, health_check]
    pipelines:
      logs:
        receivers: [filelog]
        processors: [batch, attributes/log, attributes/log2, attributes/log3]
        exporters: [splunk_hec/logs]

Associate log sources with source types πŸ”—

This example showcases how the Collector collects data from files and sends it to Splunk Enterprise, associating each source with a different source type. The source type is a default field that identifies the structure of an event, and determines how Splunk Enterprise formats the data during the indexing process.

processors:
  batch:
  resource/one:
    attributes:
      # Set the com.splunk.sourcetype log attribute key to sourcetype1.
      # com.splunk.sourcetype is the default key the HEC exporter will use to extract the source type of the record.
      # See https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/splunkhecexporter
      # under the configuration key `hec_metadata_to_otel_attrs/sourcetype`
      - key: com.splunk.sourcetype
        value: "sourcetype1"
        action: upsert
  resource/two:
    attributes:
      - key: com.splunk.sourcetype
        value: "sourcetype2"
        action: upsert
  resource/three:
    attributes:
      - key: com.splunk.sourcetype
        value: "sourcetype3"
        action: upsert

receivers:
    filelog/onefile:
      include: [ /output/file.log ]
    filelog/twofile:
      include: [ /output/file2.log ]
    filelog/threefolder:
      include: [ /output3/*.log ]

exporters:
    splunk_hec/logs:
        # Splunk HTTP Event Collector token.
        token: "00000000-0000-0000-0000-0000000000000"
        # URL to a Splunk instance to send data to.
        endpoint: "https://splunk:8088/services/collector"
        # Optional Splunk source: https://docs.splunk.com/Splexicon:Source
        source: "output"
        # Splunk index, optional name of the Splunk index targeted.
        index: "logs"
        # Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
        max_connections: 20
        # Whether to disable gzip compression over HTTP. Defaults to false.
        disable_compression: false
        # HTTP timeout when sending data. Defaults to 10s.
        timeout: 10s
        tls:
          # Whether to skip checking the certificate of the HEC endpoint when sending data over HTTPS. Defaults to false.
          # For this demo, we use a self-signed certificate on the Splunk docker instance, so this flag is set to true.
          insecure_skip_verify: true



extensions:
    health_check:
      endpoint: 0.0.0.0:13133
    pprof:
      endpoint: :1888
    zpages:
      endpoint: :55679

service:
    extensions: [pprof, zpages, health_check]
    pipelines:
      logs/one:
        receivers: [ filelog/onefile ]
        processors: [ batch, resource/one ]
        exporters: [ splunk_hec/logs ]
      logs/two:
        receivers: [ filelog/twofile ]
        processors: [ batch, resource/two ]
        exporters: [ splunk_hec/logs ]
      logs/three:
        receivers: [ filelog/threefolder ]
        processors: [ batch, resource/three ]
        exporters: [ splunk_hec/logs ]

Settings πŸ”—

Note

By default, the receiver doesn’t read logs from a file that is not actively being written to because start_at defaults to end.

The following table shows the configuration options for the Filelog receiver:

Troubleshooting πŸ”—

If you are a Splunk Observability Cloud customer and are not able to see your data in Splunk Observability Cloud, you can get help in the following ways.

Available to Splunk Observability Cloud customers

Available to prospective customers and free trial users

  • Ask a question and get answers through community support at Splunk Answers .

  • Join the Splunk #observability user group Slack channel to communicate with customers, partners, and Splunk employees worldwide. To join, see Chat groups in the Get Started with Splunk Community manual.

This page was last updated on Dec 12, 2024.