Home > Articles

  • Print
  • + Share This
This chapter is from the book

Bypassing Parsing with CDATA

There are times when you may want to include data in your document that contains markup, but which you do not want to be parsed. For example, if you were authoring a tutorial on HTML, and storing it in an XML file, you might have the following:

<instruction>
Titles can be <I>italicized</I> using the <I> tag.
</instruction>

This instruction element could be used in an XML document as is; however, it would cause an error because the parser would assume that <I> was a new element. To denote that the content should not be parsed, you can utilize a CDATA Section.

CDATA Sections can occur anywhere character data can occur. They are used to escape blocks of text containing characters that would otherwise be recognized as markup. CDATA Sections begin with the string <![CDATA[ and end with the string ]]>.

What that means is that you can enclose information inside these CDATA markers and that text will be ignored by the parser. So, let's take another look at our example:

<instruction>
<!CDATA[Titles can be <I>italicized</I> using the <I> tag. ]]>
</instruction>

Now the XML parser will completely ignore whatever text follows the <!CDATA[ tag until it encounters the ]]> tag. This allows you to include any type of data in that section you would like.

Keep in mind, though, that nothing inside a CDATA Section is parsed. Therefore, if you were to include entities, they would not be parsed. So, &lt;I&gt; would remain &lt;I&gt; if it were contained inside a CDATA section.

A CDATA Section can be used anywhere PCDATA occurs—as element content, and so on. However, attribute values are always parsed unless they are specified as CDATA in a DTD or Schema. So, you cannot include a CDATA Section in an attribute value.

NOTE

Some users of XML have raised the idea of including text-encoded binary data into CDATA Sections. Because the text in a CDATA Section isn't parsed, this seems like an okay idea. However, to do so, you would need to ensure that the encoding did not include ]]>. With XML Schemas, there are a number of binary datatypes that are a much better mechanism for including binary data as element content.

By default, the text content of XML documents is PCDATA, and you will not encounter the PCDATA keyword until we discuss valid XML with DTDs and Schemas. However, CDATA sections that can be used in well-formed XML do escape large sections of text, as well as be used in DTDs and Schemas. We will discuss the use of DTDs and valid XML later in Chapter 4, "Structuring XML Documents with DTDs," and XML Schemas in Chapter 5, "Defining XML Document Structures with XML Schemas."

  • + Share This
  • 🔖 Save To Your Account