Home > Articles > Web Services > XML

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

15.5 Escape Characters

A number of significant symbols have now been introduced, such as '*' and '{'. These symbols cannot be used as atoms within an expression, because they would be misinterpreted as significant pattern markup. It is therefore necessary to escape these characters when they are needed to match characters in a value (just as '&' and '<' must be escaped in XML documents when they are part of the document text).

The '\' symbol is used in a pattern to escape the character that follows it. This can be seen as similar to the function of the '&' character in XML entity references. But in this case, only the following single character is escaped, so there is no need for another symbol to end an escape sequence (as ';' is needed in the case of entity references).

Just as the sequence '&amp;' is needed in XML documents to allow the '&' character to be included as a data character, so must the '\' symbol also be escaped in patterns. In this case, the '\' escape symbol is placed before the '\' data character, giving '\\' (this should be familiar to C and Java software developers). The following pattern matches the text 'a\b':

a\\b 

The other escape sequences for single characters are: '\|', '\.', '\?', '\*', '\+', '\{', '\}', '\(', '\)', '\[', and '\]':

\| (not a branch separator)
\. (not a not-a-line-end character)
\? (not an optional indicator)
\* (not an optional and repeatable indicator)
\+ (not a required and repeatable indicator)
\{ (not a quantity start)
\} (not a quantity end)
\( (not a subgroup start) \) (not a subgroup end) 
\[ (not a character class start)
\] (not a character class end)

The following pattern matches the value '{a+b}':

\{a\+b\} 

In some circumstances, the '-' and '^' characters must also be escaped ('\-' and '\^'):

\- (not a character class subtractor or range separator)
\^ (not a negative group indicator)

In addition, escape sequences are used to include whitespace characters that would otherwise be difficult or impossible to enter from a keyboard, including '\n', '\r', and '\t':

\n (newline)
\r (return)
\t (tab)

Note that an atom can be an escape sequence, rather than a single character, and it is possible to quantify such a sequence. For example, '\++' states that the '+' character is required and repeatable.

  • + Share This
  • 🔖 Save To Your Account