Defining Data with DTD Schemas

One thing XML aims to solve is human error. Because of XML’s structure and rigidity as a language, there isn’t much room for error on the part of XML developers. If you’ve ever encountered an error at the bank (in their favor!), you can no doubt appreciate the significance of errors in critical computer systems. XML is rapidly being integrated into all kinds of computer systems, including financial systems used by banks. The rigidity of XML as a markup language will no doubt make these systems more robust. The facet of XML that allows errors to be detected is the schema, which is a construct that allows XML developers to define the format and structure of XML data.

This hour introduces you to schemas, including the two major types that are used to define data for XML documents. The first of these schema types, DTDs, is examined in detail in this hour, while the latter is saved for a later lesson. This hour explores the inner workings of DTDs and shows you how to create DTDs from scratch.

In this hour, you’ll learn

  • How XML allows you to create custom markup languages
  • The role of schemas in XML data modeling
  • The difference between the types of XML schemas
  • What constitutes valid and well-formed documents
  • How to declare elements and attributes in a DTD
  • How to create and use a DTD for a custom markup language

Creating Your Own Markup Languages

Before you get too far into this hour, I have to make a little confession. When you create an XML document, you aren’t really using XML to code the document. Instead, you are using a markup language that was created in XML. In other words, XML is used to create markup languages that are then used to create XML documents. The term "XML document" is even a little misleading because the type of the document is really determined by the specific markup language used. So, as an example, if I were to create my very own markup language called MML (Michael’s Markup Language), then the documents I create would be considered MML documents, and I would use MML to code those documents. Generally speaking, the documents are still XML documents because MML is an XML-based markup language, but you would refer to the documents as MML documents.

The point of this discussion is not to split hairs regarding the terminology used to describe XML documents. It is intended to help clarify the point that XML is a technology that enables the creation of custom markup languages. If you’re coming from the world of HTML, you probably think in terms of there being only one markup language—HTML. In the XML world, there are thousands of different markup languages, with each of them applicable to a different type of data. As an XML developer, you have the option of using an existing markup language that someone else created using XML, or you can create your own. An XML-based markup language can be as formal as XHTML, the version of HTML that adheres to the rules of XML, or as informal as my simple Tall Tales trivia language.

When you create your own markup language, you are basically establishing which elements (tags) and attributes are used to create documents in that language. Not only is it important to fully describe the different elements and attributes, but you must also describe how they relate to one another. For example, if you are creating a markup language to keep track of sports information so that you can chart your local softball league, you might use tags such as <schedule>, <game>, <team>, <player>, and so on. Examples of attributes for the player element might include name, hits, rbis, and so on.

The question you might now be asking yourself is how exactly do you create a markup language? In other words, how do you specify the set of elements and attributes for a markup language, along with how they relate to each other? Although you could certainly create sports XML documents using your own elements and attributes, there really needs to be a set of rules somewhere that establishes the format and structure of documents created in the language. This set of rules is known as the schema for a markup language. A schema describes the exact elements and attributes that are available within a given markup language, along with which attributes are associated with which elements and the relationships between the elements. You can think of a schema as a legal contract between the person who created the markup language and the person who will create documents using that language.

