Home > Articles > Programming > C/C++

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

20.2. Dealing with Escape Sequences (\)

Escape sequences are a little tricky in C++ regular expressions, because they occur in two contexts.

  • b-box.jpg C++ assigns special meaning to the backslash within a string literal and requires it to be escaped to be read as an actual backslash: To represent a single backslash, it’s necessary to place double backslashes (\\) in the source code. (Exception: Raw literals, supported by C++11, remove the need to escape characters.)
  • b-box.jpg The regular-expression interpreter also recognizes a backslash as the escape character. To render a special character literally, you must precede it with a backslash (\).

Consequently, if you want to render a special character literally, then, within a C++ literal string, you must precede the character with two backslashes, not just one.

For example, suppose you want to specify a pattern that matches an actual plus sign (+). The pattern is specified in source code this way:

std::regex  reg("\\+");

When the C++ compiler reads the literal, “\\+”, it interprets \\ as an escape sequence that represents a single backslash. The actual string data that gets stored in memory is therefore:

\+

This is the string read by the regular-expression interpreter. It interprets “\+” as an actual plus sign (+).

Consider the following regular-expression pattern:

std::regex  reg("\\++");

Notice what’s going on here: The first three characters (\\+) represent a literal plus sign (+). The fourth character (+) has its usual—and special—meaning; this second plus sign modifies the overall pattern to mean, “Match one or more copies of the preceding expression.” The string as a whole therefore matches any of the following:

+
++
+++++

How do you represent a literal backslash, should you ever need to do that? That is, what is the regular-expression pattern that matches a target string consisting of one or more backslashes? The answer is that you need four backslashes.

using std::regex;
regex  reg("\\\\+");   // Matches one or more backslashes.

This regular-expression object, reg, would match any of the following:

str1[] = "\\"         // Represents "\".
str2[] = "\\\\"       // Represents "\\".
str3[] = "\\\\\\"     // Represents "\\\".

Note that if you use raw-string literals, supported by the C++11 specification, you don’t have to deal with C literal-string escape conventions, so this example would be coded as:

regex  reg(R"\\+");   // Matches one or more backslashes.
str1[] = R"\"         // Represents "\".
str2[] = R"\\"        // Represents "\\".
str3[] = R"\\\"       // Represents "\\\".

The use of R does not change the format of the strings (which is still const char*); it merely changes how literal text inside the quoted string is interpreted.

  • + Share This
  • 🔖 Save To Your Account