Home > Articles > Programming > Java

Converting DTDs to XML Schemas

📄 Contents

  1. Differences Between DTDs and Schemas
  2. A Sample Conversion
  • Print
  • + Share This
Sometimes DTDs don't offer the functionality you need for your XML documents, and sometimes a DTD that's already in place needs to be converted to an XML Schema. In this article, David Gulbransen walks through the conversion process from DTD to XML Schema.
This article is adapted from David Gulbransen's book Special Edition Using XML Schema (Que, 2001, ISBN 0-7897-2607-6).
From the author of

XML Schemas represent the next step in the evolutionary chain of XML. However, there are still valid reasons for learning about and dealing with DTDs. First, DTDs offer compatibility with SGML. Second, DTDs can be more straightforward than schemas, if your document requirements are simple and you don't need to make use of datatypes.

Of course, sometimes DTDs don't offer the functionality you need for your XML documents. You might also have a DTD already in place because XML Schemas weren't ready when you did your initial development, and now you need to convert the DTD to add more functionality. Here are some reasons why you might want to convert your existing DTDs into XML Schemas:

  • Ensuring compatibility with new XML products

  • Making use of datatypes

  • Creating more complex constraints on the validity of documents

Differences Between DTDs and Schemas

The single biggest difference between DTDs and Schemas is that DTDs have their own syntax, while XML Schemas are well-formed XML. This is a critical difference for several reasons. On a development level, it means that XML Schemas can be parsed like regular XML documents. On an authoring level, it means that XML Schemas are verbose, as XML was not designed with brevity in mind. However, that's not the most significant difference between DTDs and XML Schemas; that honor goes to datatypes.

It's very easy to see how datatypes can be useful in schema design. Take for example a zip code element. Using a DTD, we would only be able to specify that a <zip> element was text. That would mean that someone could enter W321GWG@(!#@ as a ZIP code, and it would still be considered valid. Using XML Schemas, we could actually create a datatype for ZIP codes; using the pattern facet, we could then limit the zip element to the standard five-digit ZIP code. We could also create a datatype to deal with ZIP+4 if we wanted. The ability to get that specific with the datatypes of the content value for our elements and attributes is a very powerful aspect of XML Schemas.

Another area in which DTDs and XML Schemas differ is expression of cardinality. With DTDs, it's only possible to express the occurrence of an element within a content model by using one of three symbols:

*  allows you to specify that an element can occur any number of times.

+  signifies that an element can occur one or more times.

?  limits the element occurrence to zero or one.

This doesn't allow for a great deal of flexibility. Say you had a DTD for bus, and you wanted to specify that the bus had to have at least 10 passengers, in order to be cost-effective, but could have up to 25. In a DTD, this is quite ugly:

<!ELEMENT bus (passenger, passenger, passenger, passenger, passenger,
 passenger, passenger, passenger, passenger, passenger, passenger?,
 passenger?, passenger?, passenger?, passenger?, passenger?,
 passenger?, passenger?, passenger?, passenger?, passenger?,
 passenger?, passenger?, passenger?, passenger?)>

In a schema, this is much easier:

<xs:element name="bus">
<xs:element name="passenger" minOccurs="10" maxOccurs="25"/>

This is a much cleaner mechanism for defining elements that have specific occurrence constraints. If you need this type of control over the elements in your schema, DTDs fall short of the functionality of XML Schemas.

Another advantage of XML Schemas is that both attributes and elements can be enumerations. With DTDs, if you want to restrict your values to a list of choices, this must be done as an attribute, which adds complexity to your content model.

  • + Share This
  • 🔖 Save To Your Account