Sams Teach Yourself XML in 21 Days

Sams Teach Yourself XML in 21 Days

By Steven Holzner

Creating CDATA Sections

When an XML processor parses an XML document, it interprets the markup in that document and replaces entity references (like the built-in general entity reference &quot;) with whatever those entity references refer to (which is a double quotation mark, ", for the general entity reference &quot;). On the other hand, sometimes you might not want text data parsed—for example, what if your text contains many < and & characters? When parsed, those characters will be interpreted as part of the markup unless you convert them to &lt; and &amp;, which is called escaping them. To avoid that, you can specify that you don't want the XML processor to parse part of your text data by placing it in a CDATA section. CDATA stands for character data, as opposed to parsed character data, which is PCDATA.

You use the CDATA section to tell the XML processor to leave the enclosed text alone, and pass it on unchanged. You start a CDATA section with the markup <![CDATA[ and end it with ]]>.

For example, suppose you are documenting how your XML application works, and want to say this:

Here's how the element starts:

    <employee status="retired">
        <name>
            <lastname>Kelly</lastname>
            <firstname>Grace</firstname>
        </name>
        <hiredate>October 15, 2005</hiredate>
        <projects>
            <project>
                <product>Printer</product>
                <id>111</id>
                <price>$111.00</price>
            </project>
                .
                .
                .

This partial <employee> element without a closing </employee> tag would drive an XML processor crazy, so you should enclose this text in a CDATA section to tell the XML processor not to parse it, as you see in Listing 2.3. When an XML processor parses this document, it is supposed to place the text in the CDATA section directly into the output it produces, without trying to interpret that text (as well as removing the <![CDATA[ and ]]> markup).

Example 2.3. Using a CDATA Section in an XML Document (ch02_03.xml)

<?xml version = "1.0" standalone="yes"?>
<document>
    <text>
    Here's how the element starts:

       <![CDATA[

           <employee status="retired">

               <name>

                   <lastname>Kelly</lastname>

                   <firstname>Grace</firstname>

               </name>

               <hiredate>October 15, 2005</hiredate>

               <projects>

                   <project>

                       <product>Printer</product>

                       <id>111</id>

                       <price>$111.00</price>

                   </project>

                       .

                       .

                       .

       ]]>
    </text>
</document>

You can see that Internet Explorer treats this CDATA section as unparsed text in Figure 2.11. (If it had parsed the text, you would see an error instead of the display you see in the figure.)

02fig11.gif

Figure 2.11 Viewing a CDATA section in Internet Explorer.

Here's another example using XHTML, the version of HTML that is written in XML. XHTML pages can be parsed like other XML documents, but that can cause problems if you've included certain characters that a scripting language like JavaScript uses, such as the less than (<) JavaScript operator. To avoid confusing an XML processor reading an XHTML page with this embedded JavaScript operator, you can enclose that JavaScript in a CDATA section:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/tr/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
        <title>
            Checking the temperature
        </title>
    </head>

    <body>
        <script language="javascript">
            <![CDATA[

                   var temperature

                   temperature = 234.77

                   if (temperature < 32) {

                       document.writeln("Below freezing!")

                   }

               ]]>
        </script>

        <center>
            <h1>
                Checking the temperature
            </h1>
        </center>
    </body>
</html>

Unfortunately, there's a problem here—the markup <![CDATA[ and ]]>, confuses HTML browsers, which means you can't use syntax like this until those browsers are fully equipped to handle XHTML. You can, however, include JavaScript in XHTML pages like this one if they're intended only for HTML browsers, not XML processors, by omitting the markup <![CDATA[ and ]]>.

Share ThisShare This

Informit Network