Splunk Cloud Platform

Use Edge Processors

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Enrich data with lookups using an Edge Processor

You can enrich your data by adding relevant information using a lookup. A lookup is a knowledge object that matches the field-value combinations in your event data with field-value combinations in a lookup table, and then adds the relevant information from the lookup table to your events. By creating and applying a pipeline that uses a lookup, you can configure an Edge Processor to add more information to the received data before sending that data to a destination.

For example, assume that the following events represent purchases from the fictitious Buttercup Games store:

date ip_address product_id
2023-11-22 107.3.146.207 WC-SH-G04
2023-11-22 128.241.220 DC-SG-G02
2023-11-23 194.215.205.19 FS-SG-G03

These events contain the date when the purchase happened, the IP address of the customer who made the purchase, and the product ID of the item that was purchased. If you have a lookup table that maps the product IDs to product names, you can use it in an Edge Processor pipeline to add the corresponding product names to these events

For more information about lookups, see About lookups in the Splunk Cloud Platform Knowledge Manager Manual.

Limitations

Edge Processors support CSV and KV Store lookups only. They do not support geospatial or external lookups.

Be aware that Edge Processors parse all lookup data as strings. In order for you to match a field from your events with a field from a lookup table, the event field must be a string field. For example, an event field like http_status=200 is an integer field and cannot be matched with lookup fields. However, http_status="200" is a string field that can be matched with lookup fields. For more information about data types in SPL2, see Built-in data types in the SPL2 Search Reference manual.

Prerequisites

Before starting to configure lookups for Edge Processors, you must have the following:

  • An Edge Processor that was created on or after December 8, 2023.

    Edge Processors that were created earlier than this date cannot support lookups.

  • A lookup table stored in one of the following ways:
    • In a CSV file. Make sure that the file meets the restrictions described in About the CSV files in the Splunk Cloud Platform Knowledge Manager Manual.
    • In a KV Store collection on the Splunk Cloud Platform deployment that is pair-connected with your Edge Processor tenant. Make sure that the KV Store collection meets the configuration requirements described in Special KV Store collection configuration for federated search in the Splunk Cloud Platform Knowledge Manager Manual.

      The pair-connected Splunk Cloud Platform deployment is the deployment that was connected to the Edge Processor service during the first-time set up process for the Edge Processor solution. For more information, see First-time setup instructions for the Edge Processor solution.

The instructions on this page refer to an example scenario using a lookup dataset named prices.csv. If you'd like to follow along with these example configurations, then complete these steps to get the prices.csv file:

  1. Download the Prices.csv.zip file.
  2. Uncompress the Prices.csv.zip file. There is only one file in the ZIP file, prices.csv.

After meeting these prerequisites, perform the following steps to configure an Edge Processor to enrich the incoming event data using lookups:

  1. Create a lookup in the connected Splunk Cloud Platform deployment.
  2. Confirm the availability of the lookup dataset.
  3. Create a pipeline.
  4. Configure your pipeline to enrich event data using a lookup.
  5. Save and apply your pipeline.

Create a lookup in the connected Splunk Cloud Platform deployment

Start by creating your lookup in the Splunk Cloud Platform deployment that is pair-connected with your Edge Processor tenant. Doing this makes your lookup available as a lookup dataset in the tenant, which you can then import and use in Edge Processor pipelines.

The Edge Processor solution supports CSV lookups and KV Store lookups. To follow along with the example scenario described on this page, create a CSV lookup using the prices.csv file.

Creating CSV lookups for Edge Processors

For detailed instructions on creating a CSV lookup, see Define a CSV lookup in Splunk Web in the Splunk Cloud Platform Knowledge Manager Manual. When creating a CSV lookup for use in an Edge Processor, do the following:

  1. In Splunk Cloud Platform, upload the .csv or .csv.gz file containing your lookup table.
  2. (Optional) If you don't want other users to be able to see all of the contents of your lookup table, you can create a restricted view of the table by creating a lookup definition.
  3. Update the permissions associated with the CSV file or the lookup definition. The file or definition must be available to all apps, your Splunk Cloud Platform user account, and the service account used in the scpbridge connection that connects the Splunk Cloud Platform deployment with the Edge Processor tenant.
    • Set the Object should appear in option to All apps (system).
    • Make sure that Read permission for the file or definition is available to a role that is associated with your Splunk Cloud Platform user account.
    • Make sure that Read permission is also available to the role used by the service account. Typically, the name of this role is scp_user, if you used the role name suggested in Create a role for the service account during the initial setup of the Edge Processor solution.
  4. Make sure that a role that is associated with your user account and the role used by the service account both have Read permission for the Destination app that is associated with the CSV file or lookup definition.
    1. Select Apps, then select Manage Apps.
    2. Find the app that your CSV file or lookup definition is associated with, and then select Permissions.
    3. Select Read permission for the necessary roles, and then select Save.

Creating KV Store lookups for Edge Processors

For detailed instructions on creating a KV Store lookup, see Define a KV Store lookup in Splunk Web in the Splunk Cloud Platform Knowledge Manager Manual. When creating a KV Store lookup for use in an Edge Processor, do the following:

  1. Create a lookup definition for your KV Store collection.
  2. Update the permissions associated with the lookup definition. The definition must be available to all apps, your Splunk Cloud Platform user account, and the service account used in the scpbridge connection that connects the Splunk Cloud Platform deployment with the Edge Processor tenant.
    • Set the Object should appear in option to All apps (system).
    • Make sure that Read permission for this definition is available to a role that is associated with your Splunk Cloud Platform user account.
    • Make sure that Read permission is also available to the role used by the service account. Typically, the name of this role is scp_user, if you used the role name suggested in Create a role for the service account during the initial setup of the Edge Processor solution.
  3. Make sure that a role that is associated with your user account and the role used by the service account both have Read permission for the Destination app that is associated with the lookup definition.
    1. Select Apps, then select Manage Apps.
    2. Find the app that your lookup definition is associated with, and then select Permissions.
    3. Select Read permission for the necessary roles, and then select Save.

Confirm the availability of the lookup dataset

After creating your lookup in the connected Splunk Cloud Platform deployment, confirm that the lookup is available as a dataset in your Edge Processor tenant.

  1. Log in to your Edge Processor tenant.
  2. Refresh the scpbridge connection by doing the following:
    1. Select the Settings icon (Image of the Settings icon) and then select System connections.
    2. On the scpbridge connection, select the Refresh icon (This image shows an icon that looks like two curved arrows going in a circle.).
  3. Navigate to the Datasets page and find your lookup dataset. The dataset has the same name as the CSV file or the lookup definition.
    If the Datasets page does not show your lookup dataset, then there might be a permissions error preventing the scpbridge connection from accessing the dataset. Verify that the role used by the service account for the scpbridge connection has read permission for your lookup table or definition. See Create a lookup in the connected Splunk Cloud Platform deployment on this page for more information.
  4. Select your lookup dataset, and then select the Open in Search icon (This image shows an icon with an arrow pointing at a 45 degree angle out from a box.).
  5. In the Search page, select the Run icon (This image shows an icon with a triangle pointing to the right, similar to a Play button.).
    If the search results pane displays information from your lookup dataset, then the lookup dataset is available in your tenant and ready to be used in pipelines. If the search results pane displays an error or 0 results, then there might be a permissions error preventing you from fully accessing the lookup dataset. In Splunk Cloud Platform, verify that your user account has read permission for your lookup table or definition. See Create a lookup in the connected Splunk Cloud Platform deployment on this page for more information.

You now have a lookup dataset that you can use to enrich events in an Edge Processor pipeline. Next, start creating the pipeline.

Create a pipeline

  1. Navigate to the Pipelines page and then select New pipeline.
  2. Select Blank pipeline and then select Next.
  3. Specify a subset of the data received by the Edge Processor for this pipeline to process. If you want to use the sample data given in step 4 so that you can follow along with the example configurations described in later sections of this page, skip this step. To define a partition, complete these steps:
    1. Select the plus icon (This image shows an icon of a plus sign.) next to Partition or select the option that matches how you would like to create your partition in the Suggestions section.
    2. In the Field field, specify the event field that you want the partitioning condition to be based on.
    3. To specify whether the pipeline includes or excludes the data that meets the criteria, select Keep or Remove.
    4. In the Operator field, select an operator for the partitioning condition.
    5. In the Value field, enter the value that your partition should filter by to create the subset.
    6. Select Apply.
    7. You can create more conditions for a partition in a pipeline by selecting the plus icon (This image shows an icon of a plus sign.).

    8. Once you have defined your partition, select Next.
  4. (Optional) Enter or upload sample data for generating previews that show how your pipeline processes data.

    The sample data must be in the same format as the actual data that you want to process. See Getting sample data for previewing data transformations for more information.

    If you want to follow the configuration examples in the next section, then select CSV and then enter the following sample events, which represent three fictitious purchases made from a store website:

    date,ip_address,product_id
    2023-11-22,107.3.146.207,WC-SH-G04
    2023-11-22,128.241.220.82,DC-SG-G02
    2023-11-23,194.215.205.19,FS-SG-G03
    
  5. Select Next to confirm your sample data.
  6. Select the name of the destination that you want to send data to. Then, do one of the following:
    • If you selected a Splunk platform S2S or Splunk platform HEC destination, select Next.
    • If you selected another type of destination, select Done and skip the next step.
  7. (Optional) If you're sending data to a Splunk platform deployment, you can specify a target index:
    1. In the Index name field, select the name of the index that you want to send your data to.
    2. (Optional) In some cases, incoming data already specifies a target index. If you want your Index name selection to override previous target index settings, then select the Overwrite previously specified target index check box.
    3. Select Done.
    4. Be aware that the destination index is determined by a precedence order of configurations. See How does an Edge Processor know which index to send data to? for more information.

You now have a simple pipeline that receives data and sends that data to a destination. In the next section, you'll configure this pipeline to enrich your data using information from your lookup dataset.

Configure your pipeline to enrich event data using a lookup

Add an Enrich events with lookup action to your pipeline. Configure this action to specify how the pipeline matches field-value combinations in the incoming event data with field-value combinations in a lookup dataset and adds information from that dataset to the events.

  1. (Optional) Select the Preview Pipeline icon (Image of the Preview Pipeline icon) to generate a preview that shows what the sample data looks like when it passes through the pipeline.
  2. If you used the sample events described in the previous section, then the preview results panel displays the following:

    date ip_address product_id
    2023-11-22 107.3.146.207 WC-SH-G04
    2023-11-22 128.241.220 DC-SG-G02
    2023-11-23 194.215.205.19 FS-SG-G03
  3. Select the plus icon (This image shows an icon of a plus sign.) in the Actions section, and then select Enrich events with lookup.
  4. Open the Lookup dataset menu, select the lookup dataset that you want to use, and then select Select lookup dataset.
    For example, if you created a lookup dataset using the prices.csv file, then select prices.csv.
    The Enrich events with lookups dialog box loads the information from your selected lookup dataset.
  5. In the Match fields area, define one or more pairs of lookup fields and event fields that you want to match. When these matched fields contain identical values, the pipeline adds data from the lookup dataset to the event.

    For example, the product_id field from the sample data and the productId field from the prices.csv lookup dataset both contain product ID values used by the fictitious Buttercup Games store. If you match these fields, then whenever the pipeline receives an event that has WC-SH-G04 as a product_id value, the pipeline will update that event to include data from the lookup dataset row that has WC-SH-G04 as a productId value. To match these fields, configure the following settings in the Match fields area:
    Option name Enter or select the following
    Lookup field productId
    Event field product_id
  6. In the Output fields area, select the fields from the lookup dataset that you want to add to your events. You can choose to add all the fields from the dataset by selecting Output all fields.
  7. For example, to add product names from the prices.csv lookup dataset to the sample events, select product_name.

  8. (Optional) Specify the action to take when an incoming event already contains the selected output fields:
    • To replace the data in the event with the data from the lookup dataset, select Overwrite existing values in events. This setting is selected by default.
    • To leave the existing data in the event unchanged, and only fill in empty or missing fields using data from the lookup dataset, deselect Overwrite existing values in events.

    For example, consider if the pipeline receives an event that erroneously has WC-SH-G04 as a product_id value and Pony Run as a product_name value. According to the prices.csv dataset, the WC-SH-G04 product ID corresponds to the World of Cheese product name. When Overwrite existing values in events is selected, the pipeline changes the product_name value in the event from Pony Run to World of Cheese. When that option is not selected, the pipeline does not change the product_name value in the event.

  9. To confirm the configuration of your lookups, select Apply.
  10. The pipeline editor adds an import statement and a lookup command to your pipeline.

    • The import command imports the lookup dataset into the pipeline so that the pipeline can use it. See Importing datasets into Edge Processor pipelines for more information.
    • The lookup command matches fields from the lookup dataset with fields from the incoming events, and then enriches the events by adding the specified output fields to the event. See lookup command overview in the SPL2 Search Manual for more information.

    For example, this pipeline does the following:

    • Imports the prices.csv lookup dataset.
    • Matches the product_id field in incoming events with the productId field in the prices.csv dataset.
    • Adds the corresponding product_name values from the prices.csv dataset into incoming events, overwriting any product_name values that might already exist in the events.
    import 'prices.csv' from /envs.splunk.buttercup.lookups
    
    $pipeline = | from $source | lookup 'prices.csv' productId AS product_id OUTPUT product_name | into $destination;
    

    If you preview your pipeline again, the preview results panel displays the following:

    date ip_address product_id product_name
    2023-11-22 107.3.146.207 WC-SH-G04 World of Cheese
    2023-11-22 128.241.220 DC-SG-G02 Dream Crusher
    2023-11-23 194.215.205.19 FS-SG-G03 Final Sequel

You now have a pipeline that enriches the incoming data with additional information from a lookup dataset. In the next section, you'll save this pipeline and apply it to an Edge Processor.

Save and apply your pipeline

Be aware that when you apply a pipeline that uses a lookup or add a lookup to an applied pipeline, it can take some time for the Edge Processor to download and start using your lookup table. For example, a 200 MB lookup table takes approximately 10 minutes to download.

This download time does not disrupt data processing when you're adding a lookup to a pipeline that is already applied to an Edge Processor. However, when you initially apply a lookup pipeline to an Edge Processor, that pipeline does not start receiving or processing data until after the download is complete.

  1. To save your pipeline, do the following:
    1. Select Save pipeline.
    2. In the Name field, enter a name for your pipeline.
    3. (Optional) In the Description field, enter a description for your pipeline.
    4. Select Save.

    If your pipeline is valid, the Edge Processor service prompts you to apply it to an Edge Processor.

  2. To apply this pipeline to an Edge Processor, do the following:
    1. In the Apply pipeline prompt, select Yes, apply.
    2. Select the Edge Processors that you want to apply the pipeline to, and then select Save.

    You can only apply pipelines to Edge Processors that are in the Healthy status. Additionally, you must select an Edge Processor that was created on or after December 8, 2023 when applying a pipeline that contains a lookup.

    It can take a few minutes for the Edge Processor service to finish applying your pipeline to an Edge Processor. During this time, the affected Edge Processors enter the Pending status. To confirm that the process completed successfully, do the following:

    • Navigate to the Edge Processors page. Then, verify that the Instance health column for the affected Edge Processors shows that all instances are back in the Healthy status.
    • Navigate to the Pipelines page. Then, verify that the Applied column for the pipeline contains a The pipeline is applied icon (Image of the "applied pipeline" icon).

    If the Edge Processor service returns this error message, then there might be a permissions problem:

    Failed to apply <pipeline_name>. Pipeline must contain valid SPL2 and have valid source and destination datasets.
    

    If you see this message, then verify that your Splunk Cloud Platform user account has read permission for your lookup table or definition. See Create a lookup in the connected Splunk Cloud Platform deployment on this page for more information.

  3. (Optional) To confirm that the Edge Processor has finished downloading the lookup table and is using your pipeline to receive and process data, do the following:
    1. Navigate to the Edge Processors page.
    2. In the row that lists your Edge Processor, select the Actions icon (Image of the Actions icon) and select View debug logs. The Search page opens.
    3. Select a time range for your search.
    4. Select the Run (This image shows an icon with a triangle pointing right.) icon to search for Edge Processor log entries.
    5. Confirm if the search results include a recent log containing the message finished download of lookup tables, edge processor updates should start. If you do not see this log, wait a few minutes and then run the search and check again.
    6. In the search results, confirm that after the log described in the previous step, there is an additional log containing the message otel-collector started successfully. If you do not see this log, wait a few minutes and then run the search and check again.
  4. If you don't see any of these logs, then contact your Splunk representative for assistance.

The Edge Processor that you applied the pipeline to can now enrich the event data that it receives by adding information from the lookup dataset. For information on how to confirm that your data is being processed and routed as expected, see Verify your Edge Processor and pipeline configurations.

Update lookup datasets

The information in the lookup datasets in the Edge Processor tenant comes from the pair-connected Splunk Cloud Platform deployment. To update the information in the tenant, you must manually update the information in Splunk Cloud Platform and then refresh the scpbridge connection in the Edge Processor tenant.

Additionally, if that lookup dataset is already being used by an Edge Processor, there is an additional step that must happen before the updated information in the tenant reaches the Edge Processor. The Edge Processor service automatically deploys updated lookup information from the tenant to your Edge Processors every 4 hours or whenever a change is made to the lookup configuration in the applied pipeline.

To update the information in a lookup dataset that's already being used by an Edge Processor, complete the following steps:

  1. In the pair-connected Splunk Cloud Platform deployment, update the information in the lookup table or definition corresponding to the lookup dataset.
  2. Send this updated information from the Splunk Cloud Platform deployment to the Edge Processor service by refreshing the scpbridge connection:
    1. Select the Settings icon (Image of the Settings icon) and then select System connections.
    2. On the scpbridge connection, select the Refresh icon (This image shows an icon that looks like two curved arrows going in a circle.).
  3. Wait for the Edge Processor service to send the updated information to all the Edge Processors that are configured to use the lookup dataset.
  4. The Edge Processor service automatically updates the lookup datasets used by Edge Processors every 4 hours. Additionally, if the lookup configuration in the pipeline is modified, then the Edge Processors using that pipeline will download the updated lookup dataset.

Performance benchmarks for lookups in Edge Processors

Splunk tested the performance of lookups in an Edge Processor using these configurations:

  • A pipeline that uses the rex command to extract 1 event, and then uses the lookup command to match 1 existing field and add 1 new field.
  • A lookup table that contains 2 columns and 1 million rows. The file size of this table is 200 MB.

    This table was used in performance testing specifically. Edge Processors have also been tested using lookup tables that contain up to 20 columns.

With these configurations, the Edge Processor can perform as follows:

  • Process and send out 30,000 events per second at a rate of 550 KB per second.
  • Use 104% CPU, where 100% is 1 core.
  • Use 2.4 GB of RAM.

When you apply a pipeline that uses a lookup or add a lookup to an applied pipeline, it can take some time for the Edge Processor to download and start using the lookup table. It takes approximately 10 minutes to download a 200 MB lookup table.

This download time does not disrupt data processing when you're adding a lookup to a pipeline that is already applied to an Edge Processor. However, when you initially apply a lookup pipeline to an Edge Processor, that pipeline does not start receiving or processing data until after the download is complete.

These are benchmarks for expected performance but are not guaranteed.

Last modified on 24 April, 2024
PREVIOUS
Route subsets of data using an Edge Processor
  NEXT
Extract fields from event data using an Edge Processor

This documentation applies to the following versions of Splunk Cloud Platform: 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308 (latest FedRAMP release), 9.1.2312


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters