Home > Articles > Web Services > XML

Creating Valid XML Documents: DTDs

  • Print
  • + Share This
In HTML, a browser can check HTML because it knows all about legal HTML. In XML, you define what's legal and what's not by specifying the syntax you're going to allow for an XML document. This makes validation critical. Steve Holzner shows you how to validate XML documents with document type definitions (DTDs).
This chapter is from the book

The past couple days have prepared you for what's coming up now—the creation of valid XML documents. Unlike with HTML, where a browser can check HTML because it knows all about legal HTML, you create your own markup in XML, which means that an XML processor can't check your markup unless you let it know how to. In XML, you define what's legal and what's not by specifying the syntax you're going to allow for an XML document. There are two ways to validate XML documents—with document type definitions (DTDs) and with XML schemas. Today and tomorrow cover DTDs, and Days 6, "Creating Valid XML Documents: XML Schemas," and 7, "Creating Your Own Types in XML Schemas," cover XML schemas.

Here's an overview of today's topics:

  • Creating DTDs

  • Using validators

  • Declaring elements

  • Using ANY to allow any content

  • Declaring child elements

  • Declaring parsed character data

  • Creating child sequences

  • Using DTD choices

  • Using internal and external DTDs

  • Using DTDs and namespaces

DTDs provided the original way to validate XML documents, and the syntax for DTDs is built right in to the XML 1.0 specification. Tons of XML processors out there use DTDs in XML documents, and DTDs are the first step in any discussion on validation. But it's also true that DTDs are limited compared to XML schemas, and with the vast support Microsoft is pouring into XML schemas, schemas are really taking off these days. The details on schemas are provided on Days 6 and 7.

All About DTDs

Yesterday we discussed creating well-formed XML documents, and while an XML document needs to be well-formed to be considered a true XML document, that's only part of the story. In real life, we also need to give an XML processor some way of checking the syntax (also called the grammar) of an XML document to make sure the data remains intact. For example, take a look at the XML document you created yesterday that contains data about employees:

<?xml version = "1.0" standalone="yes"?>
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
    .
    .
    .
</document>

Say we've expanded to 5,000 employees, and that we have a team of typists typing in all that employee data. The likelihood is high that there are going to be errors in all that data entry. But how will an XML processor know that a <project> element must contain at least one <product> element unless we tell it so? How do we tell an XML processor that each <employee> element must contain one <name> element? To do this and more, we can use a DTD. DTDs are all about specifying the structure of an XML document, not the data in that document. The formal rules for DTDs are available in the XML 1.0 recommendation, http://www.w3.org/TR/REC-xml. (Note that the XML 1.1 candidate recommendation has nothing to add about DTDs as of this writing.)

We define the syntax of an XML document by using a DTD, and we declare that definition in a document by using a document type declaration. We can use a <!DOCTYPE> element to create a DTD, and the DTD appears in that element. The element can take many different forms, including the following (where URI is the URI of a DTD outside the current XML document and rootname is the name of the root element) :

  • <!DOCTYPE rootname [DTD]>

  • <!DOCTYPE rootname SYSTEM URI>

  • <!DOCTYPE rootname SYSTEM URI [DTD]>

  • <!DOCTYPE rootname PUBLIC identifier URI>

  • <!DOCTYPE rootname PUBLIC identifier URI [DTD]>

To use a DTD, we need a DTD, which means we need a <!DOCTYPE> element. The <!DOCTYPE> element is part of a document's prolog. For example, here's how we would add a <!DOCTYPE> element to the employees example:

<?xml version = "1.0" standalone="yes"?>
<!DOCTYPE document [ 
    .
    .
  <!-- DTD goes here! -->
    .
    .
]> 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
    .
    .
    .
</document>

So what does a DTD look like? The actual XML syntax for DTDs is pretty terse, so today's discussion is dedicated to unraveling that terseness. To get started, Listing 4.1 shows a full <!DOCTYPE> element that contains a DTD for the employee document. We're going to dissect that DTD today.

Listing 4.1 A Sample XML Document with a DTD (ch04_01.xml)

<?xml version = "1.0" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
<!ELEMENT employee (name, hiredate, projects)> 
<!ELEMENT name (lastname, firstname)> 
<!ELEMENT lastname (#PCDATA)> 
<!ELEMENT firstname (#PCDATA)> 
<!ELEMENT hiredate (#PCDATA)> 
<!ELEMENT projects (project)*> 
<!ELEMENT project (product,id,price)> 
<!ELEMENT product (#PCDATA)> 
<!ELEMENT id (#PCDATA)> 
<!ELEMENT price (#PCDATA)> 
] > 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
  <employee>
    <name>
      <lastname>Grant</lastname>
      <firstname>Cary</firstname>
    </name>
    <hiredate>October 20, 2005</hiredate>
    <projects>
      <project>
        <product>Desktop</product>
        <id>333</id>
        <price>$2995.00</price>
      </project>
      <project>
        <product>Scanner</product>
        <id>444</id>
        <price>$200.00</price>
      </project>
    </projects>
  </employee>
  <employee>
    <name>
      <lastname>Gable</lastname>
      <firstname>Clark</firstname>
    </name>
    <hiredate>October 25, 2005</hiredate>
    <projects>
      <project>
        <product>Keyboard</product>
        <id>555</id>
        <price>$129.00</price>
      </project>
      <project>
        <product>Mouse</product>
        <id>666</id>
        <price>$25.00</price>
      </project>
    </projects>
  </employee>
</document>
  • + Share This
  • 🔖 Save To Your Account