Home > Articles > Programming

Document Type Definitions

Eventually, we'll want to do an analysis of the different entities that we're relating together, but in beginning to build our DTD, we'll start off with an excerpt from our products.xml file.

Make a copy and save it in C:\xerces-1_2_3\data (or the appropriate directory on your system). Remove all but the first vendor, Conners Chair Company, as shown in Listing 3.1.

Listing 3.1  products.xml: The Data

0: <?xml version="1.0"?>
1: <products>
2: <vendor webvendor="full">
3:   <vendor_name>Conners Chair Company</vendor_name>
4:   <advertisement>
5:     <ad_sentence>
6:     Conners Chair Company presents their annual big three
7:     day only chair sale. We're making way for our new
8:     stock! <b>All current inventory must go!</b> Regular prices
9:     slashed by up to 60%!
10:    </ad_sentence>
11:  </advertisement>
12:
13:  <product>
14:    <product_id>QA3452</product_id>
15:    <short_desc>Queen Anne Chair</short_desc>
16:    <price pricetype="cost">$85</price>
17:    <price pricetype="sale">$125</price>
18:    <price pricetype="retail">$195</price>
19:    <inventory color="royal blue" location="warehouse">
20:      12</inventory>
21:    <inventory color="royal blue" location="showroom">
22:      5</inventory>
23:    <inventory color="flower print" location="warehouse">
24:      16</inventory>
25:    <inventory color="flower print" location="showroom">
26:      3</inventory>
27:    <inventory color="seafoam green" location="warehouse">
28:      20</inventory>
29:    <inventory color="teal" location="warehouse">
30:      14</inventory>
31:    <inventory color="burgundy" location="warehouse">
32:      34</inventory>
33:    <giveaway>
34:      <giveaway_item>
35:      Matching Ottoman included
36:      </giveaway_item>
37:      <giveaway_desc>
38:      while supplies last
39:      </giveaway_desc>
40:    </giveaway>
41:  </product>
42:
43:  <product>
44:    <product_id>RC2342</product_id>
45:    <short_desc>Early American Rocking Chair</short_desc>
46:    <product_desc>
47:      with brown and tan plaid upholstery
48:    </product_desc>
49:    <price pricetype="cost">$75</price>
50:    <price pricetype="sale">$62</price>
51:    <price pricetype="retail">$120</price>
52:    <inventory location="warehouse">40</inventory>
53:    <inventory location="showroom">2</inventory>
54:  </product>
55:
56:  <product>
57:    <product_id>BR3452</product_id>
58:    <short_desc>Bentwood Rocker</short_desc>
59:    <price pricetype="cost">$125</price>
60:    <price pricetype="sale">$160</price>
61:    <price pricetype="retail">$210</price>
62:    <inventory location="showroom">3</inventory>
63:  </product>
64:
65:</vendor>
66:
67:</products>

(Notice that we added a little bit of XHTML markup on line 8.)

In the spirit of testing as we go along, let's go ahead and parse the file to make sure that there's nothing wrong with it to start. To do this, open a command prompt window and type

cd c:\xerces-1_2_3
java sax.SAXCount data/products.xml

We should get a result similar to when we tested the Xerces installation, such as

data/products/xml: 330 ms (37 elems, 27 attrs, 0 spaces, 610 chars)

If you get an error saying that tags are missing or elements are not terminated properly, there is a problem with the products.xml file. You might have inadvertently removed or left an extra tag when you removed the extra two vendors.

If you get an error saying that the class or the file cannot be found, check for typing errors.


Tip - To save time and typing mistakes, commands can be placed in a batch file. For instance, we can take the command to parse this file

java sax.SAXCount data/products.xml

and place it in a text file called val.bat, which we place in the c:\xerces-1_2_3 directory. Then, to check the file, we just go to that directory and type the following:

val

The script will handle it from there.


Notice that we left the -v switch off the command. The reason is that we're not ready to validate this file yet—we have no DTD to check against! Every single element would currently be an error because it's not defined.

Internal DTD Subsets

We'll start by embedding the DTD in the XML file, the same way we started with style sheets.

As we mentioned earlier, DTDs use a different syntax than XML itself does. To begin building one, we need to make a space for it in the document, as in Listing 3.2.

Listing 3.2  products.xml: Creating an Internal DTD

0: <?xml version="1.0"?>
1: <!DOCTYPE products [
2:
3:   <!-- Definition goes here -->
4:
5: ]>
6:
7: <products>
8: <vendor webvendor="full">
...

The <!DOCTYPE> notation on line 1 is called a Document Type Declaration and lets the processor know that this is the start of a Document Type Definition, or DTD. Line 1 also refers to products, which is the root element of the XML below it, starting on line 7. These two must match because the DTD can describe only specific structures. Notice also the brackets that start on line 1 end on line 5. They'll denote the start and the end of the definition itself. XML-style comments are allowed within the DTD, as you can see on line 3.

From here we can take a pretty literal, straightforward view. We want to define each element in terms of the other elements, attributes, or data it can contain. We'll start with the root element, products. In Listing 3.3, we add the only element that can be contained in products, the vendor.

Listing 3.3  products.xml: Defining the Root Element

0: <?xml version="1.0"?>
1: <!DOCTYPE products [
2:
3: <!ELEMENT products (vendor)+>
4:
5: ]>
6:
7: <products>
8: <vendor webvendor="full">
...

Line 3 tells the parser that we have an element named products and that all it can contain is vendor elements. The + sign tells the parser that we can have one or more vendors. Because this is the root element, we want to make sure that we have some data, so at least one is required. Now we need to define the vendor element, as in Listing 3.4.

Listing 3.4  Defining the Vendor Element

0: <?xml version="1.0"?>
1: <!DOCTYPE products [
2:
3: <!ELEMENT products (vendor)+>
4:
5: <!ELEMENT vendor (vendor_name, advertisement?, product*)>
6:
7: ]>
8:
9: <products>
10:<vendor webvendor="full">
...

Line 5 defines a vendor as an element that may contain a vendor_name, advertisement, and products. Actually, we're saying that it must contain exactly one vendor_name, it may contain one advertisement (using the ?), and it may contain any number of products, including 0 (using the *).

We haven't completely defined the vendor element yet, however. Looking at the XML file, we see that vendor can have an attribute, webvendor. We need to put this into the DTD, as in Listing 3.5.

Listing 3.5  products.xml: Adding Attributes to the Vendor Element

0: <?xml version="1.0"?>
1: <!DOCTYPE products [
2:
3: <!ELEMENT products 	(vendor)+>
4:
5: <!ELEMENT vendor   (vendor_name, advertisement?, product*)>
6: <!ATTLIST vendor   webvendor CDATA #REQUIRED>
7:
8: ]>
9:
10: <products>
11:<vendor webvendor="full">
...

Let's pick line 6 apart piece by piece. First, the <!ATTLIST> notation indicates that we're defining an attribute list for an element, as opposed to the element itself. Next, we note what element the attribute list is for—specifically, vendor. Then we list the name of the attribute, webvendor, and the type of data that can be contained in it, followed by the fact that the attribute is required.

So, the definition on line 6 means that the vendor element must have one attribute, which is called webvendor and can contain character data.

That's not really very helpful, though, because it doesn't specify anything about what that text should be. We need to make sure that it's one of our three choices, full, partial, or no. We can do that on line 6 of Listing 3.6.

Listing 3.6  products.xml: Specifying Content for the webvendor Attribute

0: <?xml version="1.0"?>
1: <!DOCTYPE products [
2:
3: <!ELEMENT products   (vendor)+>
4:
5: <!ELEMENT vendor   (vendor_name, advertisement?, product*)>
6: <!ATTLIST vendor   webvendor ( full | partial | no ) #REQUIRED>
7:
8: ]>
9:
10: <products>
11:<vendor webvendor="full">
...

We've seen the | connector before, when we were using XSLT. At that time it worked as a sort of "or" statement, and it still does. Only one of those three values is allowed. The value of webvendor must be full or partial or no.

Let's move on to our other elements. Listing 3.7 defines vendor_name, advertisement, and product.

Listing 3.7  products.xml: Specifying Content for the vendor Element

0: <?xml version="1.0"?>
1: <!DOCTYPE products [
2:
3: <!ELEMENT products   (vendor)+>
4:
5: <!ELEMENT vendor   (vendor_name, advertisement?, product*)>
6: <!ATTLIST vendor   webvendor ( full | partial | no ) #REQUIRED>
7:
8: <!ELEMENT vendor_name   (#PCDATA)>
9:
10:<!ELEMENT advertisement   (ad_sentence)+>
11:<!ELEMENT ad_sentence   (#PCDATA)>
12:
13:<!ELEMENT product (product_id, short_desc, product_desc?, price+, 
inventory+, giveaway?)>
14:
15:<!ELEMENT product_id  (#PCDATA)>
16:<!ELEMENT short_desc  (#PCDATA)>
17:<!ELEMENT product_desc  (#PCDATA)> 
18:
19:<!ELEMENT price    (#PCDATA)>
20:<!ATTLIST price     pricetype   (cost | sale | retail) 'retail'>
21:
22:<!ELEMENT inventory  (#PCDATA)>
23:<!ATTLIST inventory  color    CDATA #IMPLIED
24:           location   (showroom | warehouse) 'warehouse'>
25:
26:<!ELEMENT giveaway  (giveaway_item, giveaway_desc)>
27:<!ELEMENT giveaway_item (#PCDATA)>
28:<!ELEMENT giveaway_desc (#PCDATA)>
29:
30:]>
31:
32:<products>
33:<vendor webvendor="full">
...

Let's take this one line at a time. vendor_name, on line 8, is a simple text element, as are product_id, short_desc, and product_desc. #PCDATA represents "Parsed Character Data." This means that it is normal text, but we're assuming that it has already been parsed—that is, there is no markup contained in it.

advertisement can contain one or more ad_sentences, and giveaway must contain one giveaway_item and one giveaway_desc.

The product element is a little more complicated but not much. It contains only elements: specifically, exactly one product_id and short_desc, one optional product_desc, one or more prices, one or more inventory elements, and then an optional giveaway.

It's important to note that the order matters. Subelements must appear in the order in which they're listed in the element's definition.

Now let's take some of the more interesting elements.

Attribute Definitions

On lines 19 and 20, we're defining the price element as having a single attribute, called pricetype, which may take the values of cost, sale, or retail. This is called an enumerated datatype, because we are choosing from a set of values. Although we do need to have this information for every price element, we have not made it required. Instead we've given it a default value. If a value isn't supplied, the default value of 'retail' will be used when the data is processed.

touching it!

Actually, all attributes need some way to handle default values. This can take one of four forms:

  • #REQUIRED—As we saw with webvendor, we can force the XML file to provide a value for the attribute.

  • #IMPLIED—If an attribute is #IMPLIED, it's not required. If a value isn't supplied, there is no default value, but it's not an error. For instance, we are not concerned if a color isn't specified.

  • A literal default, such as 'retail' or 'false'—In this case, we provide a value that will be used if no value is provided for the attribute.

  • #FIXED 'literal'—If an attribute is set as #FIXED, it must always have the literal value supplied. If it's not supplied, the parser will fill in the value. If it is supplied, it has to match.

Finally, on lines 22 through 24 we have the definition of the inventory element with two attributes, color and location, but we've taken advantage of the capability to include more than one attribute in a single declaration to make things easier to read. The color attribute is a string datatype, as we've indicated by setting it as CDATA, or character data.

At this point, we've defined all of our elements, so we're ready to go ahead and validate the document. To do that, we'll go to a command prompt and, after making sure that we're in the xerces-1_2_3 directory, type

java sax.SAXCount data/products.xml -v

Mixed Content

At this point, if we haven't mistyped any of the elements in the DTD, we should see the results of parsing the document. The parser should return a message that says something like the following:

[Error] products.xml:40:14: Element type "b" must be declared.
[Error] products.xml:42:21: The content of element type "ad_sentence" must 
match "(#PCDATA)".
data/products.xml: 3790 ms (38 elems, 27 attrs, 137 spaces, 473 chars)

Congratulations, you've validated your first document! But wait, what about those errors? Those errors mean that the parser is doing exactly what it's supposed to do. We specified ad_sentence as containing nothing but #PCDATA, or parsed character data. That means that no markup is allowed. Remember, however, that we went ahead and added a bit of markup to ad_sentence when we saved the file.

So, what can we do if we want to allow, say, some XHTML tags in the vendor's advertisement? We need to specifically tell the DTD that this element can contain both #PCDATA and elements. This is called Mixed Content. In Listing 3.8, we'll tell the DTD that ad_sentence can contain any number of the specified items.

Listing 3.8  Mixed Content

...
8: <!ELEMENT vendor_name   (#PCDATA)>
9:
10:<!ELEMENT advertisement (ad_sentence)+>
11:<!ELEMENT ad_sentence   (#PCDATA | b | i | p )*>
12:<!ELEMENT b (#PCDATA)>
13:<!ELEMENT i (#PCDATA)>
14:<!ELEMENT p (#PCDATA)>
15:
16:<!ELEMENT product   (product_id, short_desc, product_desc?, price+, 
inventory+, giveaway?)>
...

Let's take a good hard look at what this means on line 11. First, because we're following the parentheses with the *, whatever is inside them can appear any number of times within the element. This means that it's acceptable for ad_sentence to be made up of #PCDATA, then a b element, then more #PCDATA, or any combination of #PCDATA and the elements listed.

Of course, we then have to go ahead and declare those elements, as we've done on lines 12 through 14. Even though you and I know they're just XHTML, the parser doesn't. To the parser, they are elements, just as vendor and product and price are elements.

Make the change to the DTD and revalidate the file. This time there should be no errors.

DTD Syntax Review

We've covered a lot of ground here, so let's take a moment and review the specific syntax for building DTDs. An element declaration consists of a name and a content model:

<!ELEMENT element-name (content)>

Content can be an element, a series of elements, a choice of elements, or #PCDATA. We can use the following special characters to add more information:

  • +—At least one is required, but the element may repeat.

  • ?—The element is not required, and may appear only once.

  • *—Not required, but may repeat.

  • |—Indicates a choice between elements.

An attribute definition consists of information about the element, the type, and the default:

<!ATTLIST element-name attribute-name TYPE 'default'>

The type is typically either CDATA, which is just text, a series of choices (such as (true | false)), or ID, IDREF, and so on, which will be discussed later. Finally, we list the default, which can be #IMPLIED, #REQUIRED, #FIXED 'somevalue', or just 'somevalue'.

The First Limitation: Datatypes

One thing you might have noticed is that although we can (and must, in fact) get specific about the types of subelements an element can contain, we don't have a lot of control over the specific types of data after we get down to the text level. Elements can be #PCDATA, attributes are CDATA, and that's it. There's no way to indicate that, say, a price can be only a number, or inventory must be an integer.

This is one of the serious limitations of DTDs and was one of the first indications that a better system was needed. Datatypes will be covered by XML Schema.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020