Validating XML with the Document Type Definition (DTD)
In This Chapter
- Document Type Definitions
- Some Simple DTD Examples
- Structure of a Document Type Definition
- DTD Drawbacks and Alternatives
XML is a meta-markup language that is fully extensible. As long as it is well formed, XML authors can create any XML structure they desire in order to describe their data. However, an XML author cannot be sure that the structure he poured so much time and effort into creating won't be changed by another XML author or for that matter an application. There needs to be a way to ensure that the XML structure cannot be changed at random. This type of assurance for XML document structure is vital for e-commerce applications and business-to-business processing, among other things. This is where the Document Type Definition (DTD) steps in. A DTD provides a roadmap for describing and documenting the structure that makes up an XML document. A DTD can be used to determine the validity of an XML document.
In this chapter we will start with several examples and a brief overview of the DTD and what it does. Then we will break down the different items that make up the structure of the DTD. The coverage of the DTD structure will begin with a discussion of the Document Type Declaration. Then we will move on to the functional items that make up the DTD. The DTD includes element definitions, entity definitions, and parameters. Finally, before closing the chapter, we will explore some of the drawbacks of DTDS and emerging alternatives for validation. Now, let's start by defining the Document Type Definition.
Document Type Definitions
DTD stands for Document Type Definition. A Document Type Definition allows the XML author to define a set of rules for an XML document to make it valid. An XML document is considered "well formed" if that document is syntactically correct according to the syntax rules of XML 1.0. However, that does not mean the document is necessarily valid. In order to be considered valid, an XML document must be validated, or verified, against a DTD. The DTD will define the elements required by an XML document, the elements that are optional, the number of times an element should (could) occur, and the order in which elements should be nested. DTD markup also defines the type of data that will occur in an XML element and the attributes that may be associated with those elements. A document, even if well formed, is not considered valid if it does not follow the rules defined in the DTD.
DTDs are part of the W3C's XML 1.0 recommendation. This recommendation may be found at http://www.w3.org/TR/REC-xml.
When an XML document is validated against a DTD by a validating XML parser, the XML document will be checked to ensure that all required elements are present and that no undeclared elements have been added. The hierarchical structure of elements defined in the DTD must be maintained. The values of all attributes will be checked to ensure that they fall within defined guidelines. No undeclared attributes will be allowed and no required attributes may be omitted. In short, every last detail of the XML document from top to bottom will be defined and validated by the DTD.
Although validation is optional, if an XML author is publishing an XML document for which maintaining the structure is vital, the author can reference a DTD from the XML document and use a validating XML parser during processing. Requiring that an XML document be validated against a DTD ensures the integrity of the data structure. XML documents may be parsed and validated before they are ever loaded by an application. That way, XML data that is not valid can be flagged as "invalid" before it ever gets processed by the application (thus saving a lot of the headaches that corrupt or incomplete data can cause).
Imagine a scenario where data is being exchanged in an XML format between multiple organizations. The integrity of business-to-business data is vital for the smooth functioning of commerce. There needs to be a way to ensure that the structure of the XML data does not change from organization to organization (thus rendering the data corrupt and useless). A DTD can ensure this.
An extra advantage of using DTDs in this situation is that a single DTD could be referenced by all the organization's applications. The defined structure of the data would be in a centralized resource, which means that any changes to the data structure definition would only need to be implemented in one place. All the applications that referenced the DTD would automatically use the new, updated structure.
A DTD can be internal, residing within the body of a single XML document. It can also be external, referenced by the XML document. A single XML document could even have both a portion (or subset) of its DTD that is internal and a portion that is external. As mentioned in the previous paragraph, a single external DTD can be referenced by many XML documents. Because an external DTD may be referenced by many documents, it is a good repository for global types of definitions (definitions that apply to all documents). An internal DTD is good to use for rules that only apply to that specific document. If a document has both internal and external DTD subsets, the internal rules override the external rules in cases where the same item is defined in both subsets.
Given this brief overview, you can quickly see why a DTD would be important to applications that exchange data in an XML format. Before diving into the actual coverage of the structure of DTDs, take a look at a couple of quick examples. This will give you a better impression of what we are talking about as we go forward.