Home > Articles > Open Source > Python

An Introduction to Regular Expressions

  • Print
  • + Share This
Alan Gauld, author of Learn to Program Using Python (Addison-Wesley, 2001) discusses how regular expressions are constructed and shows basic usage under Python. There are marked similarities between regular expression constructs and programming constructs. Alan also shows how regular expressions can be constructed using more than simple sequences of characters, anchors, and wildcards. Programming constructs such as branching and repetition are also possible.
From the author of

Alan Gauld, author of Learn to Program Using Python (Addison-Wesley, 2001) discusses how regular expressions are constructed and shows basic usage under Python. There are marked similarities between regular expression constructs and programming constructs. Alan also shows how regular expressions can be constructed using more than simple sequences of characters, anchors, and wildcards. Programming constructs such as branching and repetition are also possible.

Definition

Regular expressions are groups of characters that describe a larger group of characters. They describe a pattern of characters for which we can search in a body of text. They are very similar to the concept of wild cards used in file naming on most operating systems, whereby an asterisk(*) can be used to represent any sequence of characters in a file name. So, *.py means any file ending in .py. In fact, filename wildcards are a very small subset of regular expressions.

Regular expressions are extremely powerful tools, and most modern programming languages either have built-in support for using regular expressions or have libraries or modules available that you can use to search for and replace text based on regular expressions. A full description of them is outside the scope of this article—indeed, there is at least one whole book dedicated to regular expressions.

One interesting feature of regular expressions is that they manifest similarities of structure to programs. Regular expressions are patterns constructed from smaller units. These units are as follows:

  • Single characters

  • Wildcard characters

  • Character ranges or sets

  • Groups that are surrounded by parentheses

Note that because groups are a unit, you can have groups of groups and so on to an arbitrary level of complexity. We can combine these units in ways reminiscent of a programming language using sequences, repetitions, or conditional operators. In this article, we will consider only sequences. I'll be providing examples using Python, although if you know another language, it shouldn't be hard to translate. If you are using Python, you will need to import the re module and use its methods. For convenience, I will assume that you have already imported re in most of the examples shown.

  • + Share This
  • 🔖 Save To Your Account