Home > Articles > Programming

Regular Expression Solutions to Common Problems

  • Print
  • + Share This
Ben Forta offers a collection of useful regular expressions, along with detailed explanations of each. He uses practical real-world examples and gives you a leg up by providing commonly needed patterns that you can use.
This chapter is from the book

NOTE

The examples presented here are not the ultimate solutions to the problems presented. By now it should be clear that there rarely is an ultimate solution. More often, multiple solutions exist with varying degrees of tolerance for the unpredictable, and there is always a trade-off between performance of a pattern and its capability to handle any and all scenarios thrown at it. With that understanding, feel free to use the patterns presented here (and if needed, tweak them as suits you best).

North American Phone Numbers

The North American Numbering Plan defines how North American telephone numbers are formatted. As per the plan, telephone numbers (in the U.S.A., Canada, much of the Caribbean, and several other locations) are made up of a three-digit area code (technically, the NPA or numbering plan area) and then a seven-digit number (which is formatted as a three-digit prefix followed by a hyphen and a four-digit line number). Any digits may be used in a phone number with two exceptions: The first digit of the area code and the first digit of the prefix may not be 0 or 1. The area code is often enclosed within parentheses, and is often separated from the actual phone number by a hyphen. Matching one of (555) 555-5555 or (555)555-5555 or 555-555-5555 is easy; matching any of them (assuming that that is what you need) is a bit trickier.

J. Doe: 248-555-1234
B. Smith: (313) 555-1234
A. Lee: (810)555-1234
\ (?[2-9]\ d\ d\ )?[ -]?[2-9]\ d\ d-\ d{ 4} 
J. Doe: 248-555-1234
B. Smith: (313) 555-1234
A. Lee: (810)555-1234

The pattern begins with the curious-looking \ (?. Parentheses are optional; \ ( matches (, and ? matches 0 or 1 instance of that (. [2-9]\ d\ d matches a three-digit area code (the first digit must be 2 through 9). \ )? matches the optional closing parenthesis. [ -]? matches a single space or a hyphen, if either of them exist. [2-9]\ d\ d-\ d{ 4} matches the rest of the phone number, the three-digits prefix (the first digit of which must be 2 through 9), followed by a hyphen and four more digits.

This pattern could easily be modified to handle other presentation formats. For example, 555.555.5555:

J. Doe: 248-555-1234
B. Smith: (313) 555-1234
A. Lee: (810)555-1234
M. Jones: 734.555.9999
[\ (.]?[2-9]\ d\ d[\ ).]?[ -]?[2-9]\ d\ d[-.]\ d{ 4} 
J. Doe: 248-555-1234
B. Smith: (313) 555-1234
A. Lee: (810)555-1234
M. Jones: 734.555.9999

The opening match now tests for ( or . as an optional set, using pattern [\ (.]?. Similarly, [\ ).]? tests for ) or . (also both optional), and [-.] tests for or .. Other phone number formats could be added just as easily.

  • + Share This
  • 🔖 Save To Your Account