Home > Articles > Open Source > Python

  • Print
  • + Share This
Like this article? We recommend

Like this article? We recommend

String Changes

The next "gotcha" that current Python users face is that strings are now Unicode by default. Actually, this change couldn't come soon enough. Every day, numerous Python developers run into problems like this when dealing with Unicode and regular ASCII strings:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 0: ordinal not in range(128)

These types of errors will no longer be an everyday occurrence in 3.x.

With the model adopted by the new version of Python, users shouldn't even use the terms Unicode strings and ASCII/non-Unicode strings anymore. "What's New in Python 3.0" sums up this new model pretty explicitly:

Python 3.0 uses the concepts of text and (binary) data instead of Unicode strings and 8-bit strings. All text is Unicode; however, encoded Unicode is represented as binary data. The type used to hold text is str, the type used to hold data is bytes.

As far as syntax goes, since Unicode is now the default, the leading u or U is deprecated. Similarly, the new bytes object requires a leading b or B for its literals. For more information, see PEP 3112.

Table 1 shows a comparison of the string types and how they will change from Python 2.x to 3.x, including a mention of the new mutable bytearray type.

Table 1String Types in Python 2 and 3

Python 2.x

Python 3.x (see note)

Mutable?

str ("")

bytes (b"")

No

unicode (u"")

str ("")

No

N/A

bytearray

Yes

  • + Share This
  • 🔖 Save To Your Account