Home > Articles > Open Source > Python

  • Print
  • + Share This
From the author of

Sequences

As ever, the simplest construct is a sequence and the simplest regular expression is just a sequence of characters:

red

This will match, or find, any occurrence of the three letters r, e, and d, in that order in a string.

Thus the words red, lettered, and credible would all be picked up because they contain "red" within them.

Anchors

To provide greater control over the outcome of matches, we can supply some special characters (known as metacharacters to limit the scope of the search (see Table 1):

Table 1

Metacharacters Used in Sequences

Example

Meaning

^red

Only at the start of a line
(as in "red ribbons are good")

red$

Only at the end of a line
(as in, "I love red")

/Wred

Only at the start of a word
(as in, "it's redirected by post"

red/W

Only at the end of a word
(as in, "you covered it already")

These metacharacters are often called anchors because they fix, or anchor, the position of the re within a sentence or word. There are several other anchors defined in the re module documentation, which we don't cover in this article.

Wildcards

Sequences can also contain wildcard characters that can substitute for any character. The most general wildcard character is a period. Try this in Python:

>>> import re
>>> re.match('be.t', 'best')
<re.MatchObject instance at 864460>
>>> re.match('be.t', 'bess')

The message in angle brackets tells us that the regular expression 'be.t', passed as the first argument, matches the string 'best' passed as the second argument. 'be.t' will also match 'beat', 'bent', 'belt', and so on. The second example did not match because 'bess' didn't end in t, so no MatchObject was created. Try out a few more matches to see how this works.

Ranges or Sets

The next type of wildcard is a range or set. This consists of a collection of letters enclosed in square brackets; the regular expression will search for any one of the enclosed letters.

>>> re.match('s[pwl]am', 'spam')
<re.MatchObject instance at 7cab40>

By using a "^" sign as the first element of the group, we can say that it should look for any character except those listed. Thus, in this example

>>> re.match('[^f]ool', 'cool')
<re.MatchObject instance at 864890>
>>> re.match('[^f]ool','fool')

we can match 'cool' and 'pool', but we will not match 'fool'.

Finally, we can group sequences of characters, or other units, together by enclosing them in parentheses. Although this is not particularly useful in isolation, it is useful when combined with the repetition and conditional capabilities.

  • + Share This
  • 🔖 Save To Your Account