What Is XML?
The chapter Understanding XML (page 29) includes a more thorough and detailed explanation of XML and how to process it. The goal of this section is to give you a quick introduction to what XML is and how it makes data portable so that you have some background for reading the summaries of the Java APIs for XML that follow.
XML (Extensible Markup Language) is an industry-standard, system-independent way of representing data. Like HTML (HyperText Markup Language), XML encloses data in tags, but there are significant differences between the two markup languages. First, XML tags relate to the meaning of the enclosed text, whereas HTML tags specify how to display the enclosed text. The following XML example shows a price list with the name and price of two coffees.
<priceList> <coffee> <name>Mocha Java</name> <price>11.95</price> </coffee> <coffee> <name>Sumatra</name> <price>12.50</price> </coffee> </priceList>
The <coffee> and </coffee> tags tell a parser that the information between them is about a coffee. The two other tags inside the <coffee> tags specify that the enclosed information is the coffee's name and its price per pound. Because XML tags indicate the content and structure of the data they enclose, they make it possible to do things like archiving and searching.
A second major difference between XML and HTML is that XML tags are extensible, allowing you to write your own XML tags to describe your content. With HTML, you are limited to using only those tags that have been predefined in the HTML specification.
With the extensibility that XML provides, you can create the tags you need for a particular type of document. You define the tags using an XML schema language. A schema describes the structure of a set of XML documents and can be used to constrain the contents of the XML documents. Probably the most widely used schema language is still the Document Type Definition schema language because it is an integral part of the XML 1.0 specification. A schema written in this language is called a DTD. The DTD that follows defines the tags used in the price list XML document. It specifies four tags (elements) and further specifies which tags may occur (or are required to occur) in other tags. The DTD also defines the hierarchical structure of an XML document, including the order in which the tags must occur.
<!ELEMENT priceList (coffee)+> <!ELEMENT coffee (name, price) > <!ELEMENT name (#PCDATA) > <!ELEMENT price (#PCDATA) >
The first line in the example gives the highest level element, priceList, which means that all the other tags in the document will come between the <priceList> and </priceList> tags. The first line also says that the priceList element must contain one or more coffee elements (indicated by the plus sign). The second line specifies that each coffee element must contain both a name element and a price element, in that order. The third and fourth lines specify that the data between the tags <name> and </name> and between <price> and </price> is character data that should be parsed. The name and price of each coffee are the actual text that makes up the price list.
Another popular schema language is XML schema, which is being developed by the World Wide Web (W3C) consortium. XML Schema is a significantly more powerful language than DTD, and with its passage into a W3C Recommendation in May of 2001, its use and implementations have increased. The community of developers using the Java platform has recognized this, and the expert group for the Java API for XML Processing (JAXP) has been working on adding support for XML Schema to the JAXP 1.2 specification. This release of the Java Web Services Developer Pack (Java WSDP) includes support for XML Schema.
What Makes XML Portable?
A schema gives XML data its portability. The priceList DTD, discussed previously, is a simple example of a schema. If an application is sent a priceList document in XML format and has the priceList DTD, it can process the document according to the rules specified in the DTD. For example, given the priceList DTD, a parser will know the structure and type of content for any XML document based on that DTD. If the parser is a validating parser, it will know that the document is not valid if it contains an element not included in the DTD, such as the element <tea>, or if the elements are not in the prescribed order, such as having the price element precede the name element.
Other features also contribute to the popularity of XML as a method for data interchange. For one thing, it is written in a text format, which is readable by both human beings and text-editing software. Applications can parse and process XML documents, and human beings can also read them in case there is an error in processing. Another feature is that because an XML document does not include formatting instructions, it can be displayed in various ways. Keeping data separate from formatting instructions means that the same data can be published to different media.
XML enables document portability, but it cannot do the job in a vacuum; that is, parties who use XML must agree to certain conditions. For example, in addition to agreeing to use XML for communicating, two applications must agree to what set of elements they will use and what those elements mean. For them to use Web services, they must also agree on what Web services methods they will use, what those methods do, and when more than one method is needed, the order in which they are invoked.
Enterprises have several technologies available to help satisfy these requirements. They can use DTDs and XML schemas to describe the valid terms and XML documents they will use in communicating with each other. Registries provide a means for describing Web services and their methods. For higher level concepts, enterprises can use partner agreements and workflow charts and choreographies. There will be more about schemas and registries later in this document.