Splunk Stream

Installation and Configuration Manual

This documentation does not apply to the most recent version of Splunk Stream. For documentation on the most recent version, go to the latest release.

Configure Stream Forwarder

streamfwd.xml

The streamfwd.xml configuration file lets you define data capture parameters for the streamfwd binary. You can configure streamfwd.xml to listen on specific IP addresses and ports, enable SSL, redirect log files, collect network events, and specify network interfaces.

streamfwd.xml is included with Splunk_TA_stream and is installed in:

$SPLUNK_HOME/etc/apps/Splunk_TA_stream/default.

Important: Do not edit the streamfwd.xml file in the $SPLUNK_HOME/etc/apps/Splunk_TA_stream/default directory. This is a master copy of the configuration file. To edit the configuration, copy the streamfwd.xml to the $SPLUNK_HOME/etc/apps/Splunk_TA_stream/local directory and perform your edits there .

Basic configuration

streamfwd.xml is configured by default to listen for traffic on all available network interfaces.

<?xml version="1.0" encoding="UTF-8"?>
<CmConfig xmlns="http://purl.org/cloudmeter/config" version="6.0.0">
  <Port>8889</Port>
  <UIDirectory>../../ui</UIDirectory>
  <DataDirectory>../../data</DataDirectory>
  <LogConfig>streamfwdlog.conf</LogConfig>
</CmConfig>

The streamfwd.xml file accepts these basic configuration parameters:

<IPAddr> IP address that the Stream Forwarder listens on
<Port> TCP port that the Stream Forwarder listens on (use "0" to disable)
<SSLKey> To enable SSL for the Stream Forwarder, specify a PEM-encoded RSA private key file
<User> Name of user the streamfwd process runs as
<Group> Name of group the streamfwd process runs as
<LogConfig> Configuration file to use for logging
<DataDirectory> Location of failover and other data files
<UIDirectory> Location of user interface files (do not change)
<DefaultVocabularyPath> Location of default vocabulary files (do not change)
<LocalVocabularyPath> Location of custom vocabulary files (do not change)

Advanced configuration

The streamfwd.xmlfile accepts these advanced configuration options.

Caution: Do not modify these options unless advised by your Splunk Support representative.

<ProcessingThreads> Number of threads to use for processing network traffic
<MaxPacketQueueSize> Maximum size for each processing threads' packet queue
<SessionKeyTimeout> Idle time in seconds before SSL session keys are expired
<TcpConnectionTimeout> Idle time in seconds before TCP connections are expired
<DuplicatePacketWindow> Set this to a value greater than zero to enable automatic de-duplication of network packets. The value indicates the number of packets cached in memory (using a rolling window) to detect duplicate packets.
<GenerateFlowEvents> Set to "false" to disable generation of TCP and UDP flow events
<UseGlobalSSLSessionKeyCache> Set to "true" to share SSL cache across processing threads
<HideCreditCardNumbers> Set to "false" to disable the automatic masking of credit card numbers
<QueueEventDelivery> Set this to "true" to enable the use of a separate thread for the processing of captured events
<MapSslServers> Set this to "false" to disable automatic caching of encrypted versus unencrypted services
<ClientIpSslHashBytes> Number of Client IP octets to use for SSL processor thread hash algorithm (if global ssl session key cache is disabled)
<Protocol> Setting this to "http" will add support for more advanced content extraction, but only HTTP traffic
<MaxRequestContentLength> Max number of bytes extracted from HTTP request content (requires <Protocol>http</Protocol>)
<MaxResponseContentLength> Max number of bytes extracted from HTTP response content (requires <Protocol>http</Protocol>)
<RawRequestHeaders> Set to "true" to enable extraction of raw HTTP request headers (requires <Protocol>http</Protocol>)
<RawResponseHeaders> Set to "true" to enable extraction of raw HTTP response headers (requires <Protocol>http</Protocol>)
<AllowUtf8Conversion> Set to "true" to enable UTF8 conversion of HTTP request/response content (requires <Protocol>http</Protocol>)
<AllowSearchingContentForCharset> Set to "true" to enable searching of content for charset (requires <Protocol>http</Protocol>)


Disable Stream Forwarder admin interface

By default, the Stream Forwarder admin interface is enabled, listening on TCP port 8889. To disable the interface:

1. Go to $SPLUNK_HOME/etc/apps/Splunk_TA_stream/local.

2. Open the streamfwd.xml configuration file and change <Port>8889</Port> to <Port>0</Port>.

3. Restart your instance of Splunk Enterprise for the change to take effect.

Use XML TcpServer element to specify TCP servers

Stream forwarder automatically detects the client and server endpoints when it captures the beginnings of TCP connections. If it starts capturing traffic after establishing a TCP connection, Stream forwarder normally assumes that the sender of the first packet it sees is the client.

You can modify this behavior by inserting <TcpServer> clauses that define the endpoints of specific TCP servers. If the sender of a packet matches the endpoint, Stream forwarder correctly categorizes it as a server response packet.

Examples

Example 1: Single HTTP server endpoint

<TcpServer>
    <Address>192.168.1.102</Address>
    <Port>80</Port>
</TcpServer>

Example 2: Wildcard endpoint

<TcpServer>
    <Address>192.168.1.0</Address>
    <AddressWildCard>255.255.255.0</AddressWildCard>
    <Port>80</Port>
</TcpServer>

Use XML SSLServer element to specify SSL servers

Stream forwarder automatically detects whether endpoints are encrypted or not, and attempts to decrypt SSL sessions using the available private keys. Optionally, you can explicitly define the traffic as encrypted or decrypted by inserting <SSLServer> clauses:

<SSLServer>
    <Address>192.168.1.102</Address>
    <Port>443</Port>
</SSLServer>

Use XML Capture element to specify network interfaces

The Splunk Stream Forwarder configuration file (streamfwd.xml) is configured by default to listen for traffic on all available network interfaces. If you want to restrict data capture to specific network interfaces, you must insert an XML <Capture></Capture> clause that defines the network interfaces on which streamfwd.xml listens.

<Capture>
    <Interface>eth0</Interface>
    <Offline>false</Offline>
    <Filter>tcp port 80</Filter>
</Capture>

Examples

Example 1: Configure streamfwd.xml to include local loopback capture

Stream Forwarder by default does not capture traffic that originates and terminates on the same machine. You can enable capture of this "local loopback" traffic using a Capture element in the configuration file:

<Capture>
    <InterfaceRegex>(en|eth|lo)[0-9]*</InterfaceRegex>
</Capture>

The <InterfaceRegex> element instructs streamfwd.xml to expand and enumerate the interfaces that are actually available on the host machine, and dynamically generates internal configurations for each network interface that matches the regular expression.

Example 2: Configure streamfwd.xml for use across multiple systems

You might want to maintain a master copy of streamfwd.xml that you can reuse across multiple systems that have different network device names. The following streamfwd.xml configuration listens on all matching interfaces found.

<Capture>
    <InterfaceRegex>.*</InterfaceRegex>
</Capture>

Note that this configuration may generate startup warnings for any devices that do not support passive data capture.

Example 3: Capture data on specific network interfaces

In this example, on a system with 8 network interfaces, streamfwd.xml would listen only for tcp port 80 traffic on only two of those interfaces (4 and 5):

<Capture>
    <InterfaceRegex>eth[45]</InterfaceRegex>
    <Offline>false</Offline>
    <Filter>tcp port 80</Filter>
</Capture>

Example 4: Use pcap file instead of network interface

You can also use a previously generated pcap file instead of an actual network interface, using this variation of the <Capture> element.

<Capture>
    <Interface>/tmp/data.cap</Interface>
    <Offline>true</Offline>
    <Filter>tcp port 80</Filter>
    <Repeat>true</Repeat>
    <SysTime>true</SysTime>
    <BitsPerSecond>10000000</BitsPerSecond>
</Capture>
<Interface> Should be set to the path of your pcap file
<Offline> True means use pcap, false means <Interface> is a network device name
<Repeat> True means to play back the pcap file repeatedly for continuous load
<SysTime> True means to use the system time for packet timestamps
<BitsPerSecond> Rate limiter, defaults to 10 Mbps if undefined and <Repeat> is true

Default data extractions

Important: Data extractions are currently only supported when using the HTTP-only protocol via <Protocol>http</Protocol>.

The basic streamfwd.xml configuration also sets up a number of default data extractions that are appropriate for most HTTP related purposes. To keep things tidy, the default extractions are held within the Stream Forwarder itself and do not need to be expressed in the streamfwd.xml file.

You can modify or augment the default behavior by adding appropriate extraction elements to streamfwd.xml. When fully expressed in streamfwd.xml the internal configuration looks like this:

<RawRequestHeaders>true</RawRequestHeaders>
<RawResponseHeaders>true</RawResponseHeaders>
<Extract term="clickstream.host">
    <Source>cs-header</Source>
    <Name>Host</Name>
</Extract>
<Extract term="clickstream.referer">
    <Source>cs-header</Source>
    <Name>Referer</Name>
</Extract>
<Extract term="clickstream.useragent">
    <Source>cs-header</Source>
    <Name>User-Agent</Name>
</Extract>
<Extract term="clickstream.cookie">
    <Source>cs-header</Source>
    <Name>Cookie</Name>
</Extract>
<Extract term="clickstream.set-cookie">
    <Source>sc-header</Source>
    <Name>Set-Cookie</Name>
</Extract>
<Extract term="clickstream.cs-content-type">
    <Source>cs-header</Source>
    <Name>Content-Type</Name>
</Extract>
<Extract term="clickstream.content-type">
    <Source>sc-header</Source>
    <Name>Content-Type</Name>
</Extract>
<Extract term="clickstream.location">
    <Source>sc-header</Source>
    <Name>Location</Name>
</Extract>
<Extract term="clickstream.page-title">
    <Source>sc-content</Source>
    <Match>(?i)<TITLE>\s*(.*?)\s*</TITLE></Match>
    <Format>$1</Format>
    <ContentType>(?i)^(text/html|application/xhtml)</ContentType>
    <MaxSize>10240</MaxSize>
</Extract>
<Extract term="clickstream.cs-content">
    <Source>cs-content</Source>
    <ContentType>(?i)^(application/x-www-form-urlencoded|multipart/form-data)</ContentType>
    <MaxSize>524288</MaxSize>
</Extract>
<Extract term="clickstream.sc-content">
    <Source>sc-content</Source>
    <ContentType>(?i)(^text/|json|application/.*\+xml)</ContentType>
    <MaxSize>524288</MaxSize>
</Extract>

The <RawRequestHeaders/> and <RawResponseHeaders/> elements specify whether or not the Stream Forwarder should send along the raw full HTTP headers to Stream.

The <Extract.../> entries are laid out like this:

<Extract term="VOCAB.TERM">
    <Source>SOURCEBUFFER</Source>
    <Name>NAME</Name>
    <ContentType>CONTENTTYPE</ContentType>
    <Match>REGEX</Match>
    <Format>REPL</Format>
    <MaxSize>SIZE</MaxSize>
    <MaxExtracts>ITER</MaxExtracts>
</Extract>
 
  • VOCAB.TERM: The destination term name in our standard format (that is, clickstream.c-ip, clickstream.host, clickstream.uri-stem). This term receives the results of the extraction, if it exists.
  • SOURCEBUFFER: The name of the internal buffer from which to perform the extraction. The following table lists the available options.
cs-header Headers of the client browser request
sc-header Headers of the server response
cs-content Any request content submitted by the client browser
sc-content Any response content from the server
query The uri-query component of the requested URL
cs-cookie Represents all cookies sent in the request
sc-cookie Represents all set-cookies sent by the server
cookie Both cs-cookie and sc-cookie
  • NAME: The name of the key in the source buffer. This is only used for the cs-header, sc-header, query, cs-cookie, sc-cookie, and cookie buffers, because they are internally represented as name/value structures. For example, to extract the user agent from a request header, the correct NAME to use would be "User-Agent".
  • REGEX: An optional regular expression match. If there's a match, then the found extraction is kept, and the <Format/> is used.
  • REPL: An optional regular expression replacement that works with REGEX. If there is a REGEX match, then the REPL defines what ends up in the term.
  • CONTENTTYPE: An optional regular expression used to match the content type of the cs-content and sc-content source buffers. It has no effect on the other buffers. This allows the extractions to be picky and only be performed on the content types that make sense (that is, only on textual payloads and not images).

Note: If you modify the extractions directly in a text editor, the REGEX, REPL, and CONTENTTYPE regular expression strings must be properly escaped with the following substitutions:

char entity
< &lt;
> &gt;
" &quot;
& &amp;
  • SIZE: Defines an optional data truncation threshold.
  • ITER: Defines an optional limit to the number of times the <Match> / <Format> is applied on the source buffer. Leaving <MaxExtracts> empty or undefined is the same as setting ITER = 1. Setting ITER = 0, causes the extraction to occur repeatedly until the source buffer is exhausted. This option only has an effect when there's also a <Match> / <Format> combination.
Last modified on 11 November, 2016
Install Splunk App for Stream   Add SSL keys to use for decryption

This documentation applies to the following versions of Splunk Stream: 6.0, 6.0.1


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters