Home > Articles > Open Source > Python

  • Print
  • + Share This
From the author of

Repetition

We can create regular expressions that match repeated sequences of characters by using some special characters. We can look for a repetition of a single character or group of characters using the following metacharacters (see Table 2).

Table 2

Metacharacters Used in Repetition

Character

Meaning

Example

?

Means zero or one of the preceding characters.
Note the zero part there because that can trip you up if you aren't careful.

pythonl?y matches:
pythony

pythonly

*

Looks for zero or more of the preceding characters.

pythonl*y matches:
both of the above plus
pythonlly
,
pythonllly
, and so on

+

Looks for one or more of the preceding characters.

pythonl+y matches:
pythonly
,
pythonlly
,
pythonllly
,and so on

{n,m}

looks for n to m repetitions of the preceding characters.

fo{1,2} matches:
fo
or
foo

All of these repetition characters can be applied to groups of characters, too. Thus:

>>> import re   # Python's reg exp module - implicit import in all examples
>>> re.match('(.an){1,2}s', 'cans')
<re.MatchObject instance at 862760>

The same pattern will also match 'cancans' or 'pans' or 'canpans', but not 'bananas'.

There is one caveat with the {m,n} form of repetition, which is that it does not limit the match to only n units. Thus, in the example in Table 2, fo{1,2} will successfully match fooo because it matches the foo at the beginning of fooo. Thus, if you want to limit how many characters are matched, you need to follow the multiplying expression with an anchor or a negated range. In our case, fo{1,2}[^o] would prevent fooo from matching because it says to match 1 or 2 o's followed by anything other than an o.

  • + Share This
  • 🔖 Save To Your Account