Home > Articles > Open Source > Python

  • Print
  • + Share This
From the author of

Greedy Expressions

Regular expressions are said to be greedy. That means that the matching and searching functions will match as much as possible of the string rather than stopping at the first complete match. Normally this doesn't matter too much, but when you combine wildcards with repetition operators, you can wind up grabbing more than you expect.

Consider the following example. If we have a regular expression such as 'a.*b' that says we want to find an a followed by any number of characters up to a b, then the search function will start from the first a to the last b. That is to say that if the searched string includes more than one b, all but the last one will be included in the '.*' part of the expression. Thus in this example:

>>>re.match('a.*b','abracadabra')
<re.MatchObject instance at 444310>

the MatchObject has matched all of 'abracadab', not just the first a'b'. This greedy matching behavior is one of the most common errors made by new users of regular expressions.

To prevent this greedy behavior, simply add a ? after the repetition character, like so:

>>>re.match('a.*?b','abracadabra')
<re.MatchObject instance at 869310>

This will now match only 'ab'.

  • + Share This
  • 🔖 Save To Your Account