Home > Articles

Working with Sets of Characters in Regular Expressions

  • Print
  • + Share This

Learn how to work with sets of characters in Regular Expressions. Unlike the ., which matches any single character (as you learned in the previous lesson), sets enable you to match specific characters and character ranges.

Save 35% off the list price* of the related book or multi-format eBook (EPUB + MOBI + PDF) with discount code ARTICLE.
* See informit.com/terms

This chapter is from the book

This chapter is from the book

In this lesson you’ll learn how to work with sets of characters. Unlike the ., which matches any single character (as you learned in the previous lesson), sets enable you to match specific characters and character ranges.

Matching One of Several Characters

As you learned in the previous lesson, . matches any one character (as does any literal character). In the final example in that lesson, .a was used to match both na and sa, . matched both the n and s. But what if there was a file (containing Canadian sales data) named ca1.xls as well, and you still wanted to match only na and sa)? . would also match c, and so that filename would also be matched.

Text

sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls
ca1.xls

RegEx

.a.\.xls

Result

sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls
ca1.xls

To find n or s you would not want to match any character, you would want to match just those two characters. In regular expressions a set of characters is defined using the metacharacters [ and ]. [ and ] define a character set, everything between them is part of the set, and any one of the set members must match (but not all).

Here is a revised version of that example from the previous lesson:

Text

sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls
ca1.xls

RegEx

[ns]a.\.xls

Result

sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls
ca1.xls

Analysis

The regular expression used here starts with [ns]; this matches either n or s (but not c or any other character). The opening [ and closing ] do not match any characters—they define the set. The literal a matches a, . matches any character, \. matches the ., and the literal xls matches xls. When you use this pattern, only the three desired filenames are matched.

Character sets are frequently used to make searches (or specific parts thereof) not case sensitive. For example:

Text

The phrase "regular expression" is often
abbreviated as RegEx or regex.

RegEx

[Rr]eg[Ee]x

Result

The phrase "regular expression" is often
abbreviated as RegEx or regex.

Analysis

The pattern used here contains two character sets: [Rr] matches R and r, and [Ee] matches E and e. This way, RegEx and regex are both matched. REGEX, however, would not match.

  • + Share This
  • 🔖 Save To Your Account