Splunk® Enterprise

Python 3 Migration

Download manual as PDF

Download topic as PDF

Python Code Compatibility

Check this manual often for updated information about the Splunk platform Python 3 migration. The content is subject to change.

To revise apps for compatibility with Splunk Enterprise version 8.0, Python code should generally be made compatible with Python versions 2 and 3, both. To accomplish this, developers should include Python compatibility libraries as needed, in this order of recommended use:

  • Six
  • Python-future
  • 2to3

See the documentation for these libraries to use them to help make your Python scripts compatible with both Python 2 and Python 3. However, these libraries might not fully produce fully-compatible results, so here are a number of tips for updating scripts by hand to complete the transition to 2/3 compatible Python code.

  • Fundamental tips
  • Strings
  • Integers
  • Dictionaries and collections
  • Sorting
  • Modules
  • Other tips

Fundamentals

Current Python version

Many patterns work in both Python 2 and Python 3. However, to write code that differs between Python 2 and Python 3, determine which version of the interpreter is being used with this code:

import sys
if sys.version_info >= (3, 0):
   print("version 3.x")
else:
   print("version 2.x")

Also, check the stanza-level Python-version settings of your scripts, combined with the admin-level Python-version setting in server.conf. For more information about stanza- and global-level Python version settings, see Changes to Splunk Enterprise.

Indentation

Python 3 has different indentation rules, and mixing tabs and spaces can cause problems. For best cross-compatibility, follow PEP8 and use space indentation only. For more information, see the PEP 8 Style Guide.

Print output

In Python 3, print() is now a function instead of a statement. For Python 2, do not use print statements. You can enable the print function by including as one of the first imports:

from __future__ import print_statement

In Python 2, you could cause print to not print a new line by putting a trailing comma on the argument list. For Python 2/3 compatible code, use std.stdout.write() instead. In Python 2, you could use print >>handle "foo" but for Python 2/3 compatible code, instead use handle.write("foo\n").

Strings

Handling strings properly is important when writing cross-compatible Python code. For the purpose of this discussion, in Python there are three types of string:

  • native strings (default) – for example, "abc"
  • binary strings – for example, b"abc"
  • unicode strings – for example, u"abc"

In Python 2, a native, default string is a binary or "bytes" string. In Python 3, a native string is instead a unicode string. Python 3 is strict about not mixing different types of strings. Often, an explicit conversion is needed or a runtime error is produced.

In Python 2, default and binary strings shared the same type but iIn Python 3, you now must keep the two string types distinctly separated. Binary data should not be stored in a default string since Python 3 defaults to unicode strings, not binary.

Using strings effectively in Python 3 is a matter of knowing which type of data is contained in a string, and when strings must be converted between types.

Strategies for strings

In small, self-contained scripts, native strings might be avoided, but that's not likely in large, extensible projects. You might want to just use Python 2's unicode() everywhere, or from python-future use from builtins import str which, in Python 2.7, can cause problems. Changing the normal behavior of native strings might work inside a single script, but not likely when interfacing with larger systems.

For both Python 2 and 3, the native default string is by far the most common. The most effective strategy is usually to explicitly reference native strings and binary/bytes strings. Then, only when running on Python 3, use explicit conversions where needed.

For more tips about Python strings, including use of io.StringIO, see Other Tips later in this topic.

Different string types

Here are examples of default strings:

  • "normal strings" including with r"raw escaping"
  • I/O done with files like open("filename.ext", "r")
  • I/O streams from StringIO

Here are examples of binary/bytes strings:

  • b"bytes strings" including with br"raw escaping". Note, Python 3 also accepts rb"xxx" but only br"xxx" is portable to Python 2.
  • I/O done with files like open("filename.ext", "rb")
  • I/O streams from io.BytesIO
  • data passed via stdin/stdout/stderr using subprocess.PIPE when running external processes
  • data passed to APIs like hashlib.sha1(bytes)
  • data accepted and returned by base64.b64encode(bytes)
  • data produced by pickle, pack(), and similar functions

Most string operations are available on either type. However, the result of an operation will be the same type as the starting string:

>>> "a,b,c".split(',')
['a', 'b', 'c']

>>> b"a,b,c".split(b',')
[b'a', b'b', b'c']

This applies to regular expressions; if you compile a regex pattern for a bytes string, the result is a pattern which can be used on bytes input.

You cannot directly mix the types of strings:

>>> "a" + b"b"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate str (not "bytes") to str

Converting between string types

When a string must transition between default string and a bytes string, use decode() and encode(). decode() takes a bytes string and returns a native string by decoding the bytes, which determines which unicode values the stream of bytes represents.

>>> b"\x61\xc3\xa4\x61".decode()
'aäa'

encode() takes a default-string and returns a bytes string. It encodes individual unicode characters into the bytes needed to represent them. Both of these methods can take an encoding parameter, but default to UTF-8 which is usually appropriate.

When writing cross-compatible code, typically perform these conversions when specifically in Python 3. In Python 2, conversion methods exist, but convert between default strings and unicode strings. Typically in Python 2, default strings are used to hold both types of data, so it's best to keep them unconverted. An example of typical cross-compatible code for conversions is:

process = subprocess.Popen(['cal'], stdout=subprocess.PIPE)
out, _ = process.communicate()
if sys.version_info >= (3, 0):
   out = out.decode()
print("result: " + out)

In short, always determine which data should be in strings or bytes, and only convert at the minimal boundaries between the two types.

Errors

When converting existing Python 2 code to cross-compatibility with Python 3, it's common to find a string operation fail on a type error. A common mistake when converting is to add encode() or decode() calls until it "works." This can quickly become a source of hidden bugs. When you have an error caused by incorrect mixing of string and binary data, ask two questions:

  • Which type should the result be?
  • Where does the value that isn't of the right type come from – and why is it in the other type?

Often, a single conversion of data to the correct type can avoid many later conversions. Change the mode of opening a binary file from "r" to "rb" (which is also accepted by Python 2). Convert a string early to the correct type rather than at the point of <codeTypeError exceptions.

In Python 3, a native string can't hold arbitrary binary data; only a bytes string can do so. If you don't have binary data in the right type of string to start, it will cause errors even if you use encode() every time you use the string.

Finally, in Python 2, "abc" and b"abc" are equivalent. If the error you are getting in Python 3 involves a string constant operand being the wrong type, it's often the solution to simply mark the string constant as a b"bytes string".

Additional tips

  • Don't test for types against isinstance(x, basestring). If you must check that something is a "normal" string, use isinstance(x, splunk.util.string_type) instead.
  • If Python 2 code is specifically expecting a u"unicode" string, use splunk.util.unicode() as a replacement.
s = unicode("abc")  # BEFORE
s = splunk.util.unicode("abc")  # AFTER
  • json.loads() can accept either type of string. However, XML APIs like safe_lxml.fromstring() should use bytes stringinput in Python 3. They can accept simple XML in either format, but if the XML document contains an \<?xml ... encoding= header, then the XML layer will try to perform unicode decoding and will fail if input is already unicode.

Integers

Integer division

In Python 2, normal integer division returned the integer part of the result. For example, 7/2 == 3 was true. Expressions like x[0:len(x)/2] were valid. In Python 3, this is no longer true; integer division returns the fractional part also.

One way to return just the integer result is to use the // operator, explicitly an integer division. This operator is also available in Python 2.7. Another option is to convert the result into an int, such as int(x/2).

floor() and ceil()

In Python 2, math.floor() and math.ceil() both returned a float type result. In Python 3, they return an int type result.

No long type

In Python 2, there were two integer types: int and long, which were used depending on the size of the integer. In Python 3, only int exists and it handles either numeric range automatically. Numeric constants that are explicitly long, such as 123L must be revised to simply 123.

Octal constants

In Python 3, octal integer constant can no longer be experessed in the form 0123. The more explicit form 0o123 works both in Python 2.7 and Python 3.

Python exceptions

An important difference is that some older exception syntax that was accepted by Python 2.7 now an produces an error in Python 3.

Replace any statements like:

raise MyException, "msg" 

With the Python 3 form:

raise MyException("msg")

Also, replace:

except MyException, ex: 

With:

except MyException as ex:

ex.message

In Python 3, exception objects no longer have a .message attribute. One way to get the message component of an exception is with str(ex).

However, for the exact object that ex.message provided in Python 2, use ex.args[0].

This works in both Python 2.7 and Python 3.x.

StopIteration

Generator functions should not explicitly throw StopIteration. Rather, they should return when they are finished, instead of yielding. In Python 3.7, an explicitly thrown StopIteration produces a runtime error.

Dictionaries and other collections

The following changes apply to dict and other collection types:

  • dict no longer has the has_key() method. Replace:
if d.has_key("x"): 

With:

if "x" in d: (works in 2.x and 3.x)


  • In Python 3, dict.keys() and dict.values() no longer directly return lists. Instead, they return objects of type dict_keys and dict_values, respectively. These objects can still be iterated normally, but sometimes code that depends a list now requires explicit conversion. Replace:
l = d.keys() + [ "extra" ]  # BEFORE

With:

l = list(d.keys()) + [ "extra" ]  # AFTER
  • Don't use .iteritems(), .iterkeys(), or .itervalues(). Rather, iterate on .items(), .keys(), or .values() directly.
  • Instead of calling iter.next() on an iterator, use the next(iter) function. The associated method to override this function for your own classes is __next__(self).

Sorting

The in-place lst.sort() method is deprecated. Instead, use the portable lst = sorted(lst) instead, at least where support for older versions of Python 2 isn't required. sorted() works fine on Python 2.7, however.

sorted() (and sort()) no longer take a cmp parameter. Instead, if you are just using cmp= to sort on an attribute of each element, it's usually simple to convert to the portable key parameter. Replace:

x.sort(cmp = lambda a, b: cmp(a.meth(), b.meth()))  # BEFORE

With:

x = sorted(x, key = lambda e: e.meth())  # AFTER

This form works in Python version 2.4 and later.

If the cmp parameter is more complicated, use the cmp_to_key() function from the functools package. Replace:

x = sorted(x, cmp = my_fancy_compare_function) #BEFORE

with

x = sorted(x, key = functools.cmp_to_key(my_fancy_compare_function)) # AFTER

Convert any "hidden" cmp parameters. They are not always passed in as named parameters, as in (x.sort(my_cmp_function)).

The top-level cmp(a, b) operator isn't available in Python 3. Either rewrite your code not to require it, or consider using the replacement function from the splunk.util package (if your code only must function in Splunk 8.0 and later).

Classes that define a custom __cmp__() method in Python 2 should instead define both __eq__() and __lt__() methods. They can use the @total_ordering decorator from functools:

from functools import total_ordering
@total_ordering
class MyObject(object):
# [...]
   def __eq__(self, other):
       return self._i == other._i
   def __lt__(self, other):
       return self._i < other._i

Module-specific advice

Many common Python modules have been reorganized in Python 3. Often, common functionality can be accessed in ways that are portable to both Python 2 and Python 3. Also, don't use the syntax from package import *. Instead, specify the subpackages needed.

StringIO and cStringIO

In Python 3, this module has moved to io.StringIO. Python 2 also has an io.StringIO library, which forces u"unicode" strings. For native/"default" strings under both Python 2 and Python 3, this often causes problems. One fix:

import sys
if sys.version_info >= (3, 0):
   from io import StringIO
else:
   from StringIO import StringIO

As covered in Strings, in Python3, normal strings and bytes/binary aren't the same. For binary data, use the io.BytesIO object instead:

msg = b"Bytes I want to treat like a file"
if sys.version_info >= (3, 0):
   fh = io.BytesIO(msg)
else:
   fh = StringIO(msg)

cStringIO is gone in Python 3. Use StringIO as above. This is a bit slower in Python 2, but faster in Python 3.

httplib

httplib is http.client in Python 3:

try:
   import http.client as httplib
except ImportError:
   import httplib

urlparse

urlparse is now urllib.parse in Python 3. From Six, you can use six.moves.urllib:

from six.moves.urllib import parse as urllib_parse

Or, from pyton-future, Six, you can use future.moves:

from future.moves.urllib import parse as urllib_parse

Then call urllib_parse.urlparse(), .parse_qs(), .urlsplit(), .urlunsplit(), and so on.

If you need just a small number of methods, you can probe directly:

try:
   from urllib.parse import urlparse
except ImportError:
   from urlparse import urlparse

urllib2

urllib2 is gone in Python 3, with most of its functionality moved to various sub-classes. Usually, convert using the six or python-future equivalents. For example, from future.moves:

from future.moves.urllib.request import urlopen, Request
from future.moves.urllib.error import HTTPError

try:
   req = Request(url);
   res = urlopen(req);
except HTTPError as e:
   sys.stderr.write("nope!\n");

You can also import full packages from future.moves:

from future.moves.urllib import error as urllib_error
from future.moves.urllib import request as urllib_request

Cookie

Cookie is now http.cookies in Python 3.

try:
   from http.cookies import SimpleCookie
except ImportError:
   from Cookie import SimpleCookie

cPickle

cPickle is removed from Python 3.

string

string is removed from Python 3. Mostly, string functions have become methods on str:

x = string.strip(y)  # BEFORE
x = y.strip()  # AFTER
lst = string.split(y, ',')  # BEFORE
lst = y.split(',')  # AFTER

This style works in both Python 2 and Python 3.

queue

Queue is renamed to queue in Python 3. Most Python 2 functionality is retained with:

try:
   import queue
except ImportError:
   import Queue as queue

thread

thread has been removed in Python 3. Some internals are available as _thread:

try:
   import _thread as thread
except ImportError:
   import thread

UserDict

In Python 3, UserDict is now part of collections:

try:
   from collections import UserDict
except ImportError:
   from UserDict import UserDict

contextlib.nested

contextlib.nested() is removed from Python 3. Use a list with the with statement:

with contextlib.nested(open("a"), open("b")) as (fh_a, fh_b):  # BEFORE
with open("a") as fh_a, open("b") as fh_b:  # AFTER

ConfigParser

In Python 3, ConfigParser has been renamed configparser. However, this is not a direct library change, but an overhaul of ConfigParser. One of the biggest differences is the default key/value delimitiers, which are = and :. ConfigParser defaulted to =. Many Splunk .cfg and .conf files assume : is not a delemiter.

Duplicates is another default of configparser that differs from ConfigParser. ConfigParser allowed duplicates by default, but configparser throws exceptions on duplicates by default. There are many cases of .cfg and .conf files with duplicates, and the parameter that controls this is strict.

For compatibility with Python 2 and 3 when using configparser, create instances with the following:

configparser.ConfigParser(delimiters=('='), strict=False)

Other changes

The following additional changes also apply to Python code to be made compatible with both Python 2 and 3:

  • In Python 3, xrange() is removed. If the range is not too large, use range(). In Python 2, range() is available but slower than xrange(). In Python 3, range() is faster.
  • In Python 3, file() is no longer callable as a function. Use the open() function instead with the same parameters. open() is available in Python 2, also.
  • In Python 3, os.path.walk() is removed. Convert code to use the portable os.walk() instead.
  • In Python 3, classes should not directly assign __metaclass__:
  1. BEFORE:
class MyClass(object):
    __metaclass__ = SomeOtherObject
# AFTER:
from future.utils import with_metaclass
class MyClass(with_metaclass(SomeOtherObject, object)):
  • In Python 3, rawinput() is removed. Use input() instead. If your code needs to work the same in Python 2 and Python 3, use:
if sys.version_info >= (3, 0):
    response = input("Prompt: ")
else:
    response = raw_input("Prompt: ")
  • In Python 3, execfile() is removed. Use exec(open(file).read()) instead.
  • In Python 3, exec "code" in global_map is removed. Python 3 and Python 2 both accept exec("code", global_map) instead.
  • In Python 3, reduce() has been moved to the functools module:
import functools
functools.reduce(lambda x, y: x + y, [47, 11, 42, 13])
Last modified on 22 October, 2019
PREVIOUS
Python development with Splunk Enterprise
  NEXT
Splunk Cloud

This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 8.0.0, 8.0.1, 8.0.2, 8.0.3


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters