- Creating Applications with Java API for XML Parsing (JAXP)
- Understanding XML
- XML Related Tools
- Creating an XML Document
- Creating a Document Type Definition (DTD)
- Parsing with the Simple API for XML (SAX)
- Parsing with the Document Object Model (DOM)
- An XML Version of the CruiseList Application
Creating an XML Document
XML is programming languageneutral. But because we are not neutral, we will be programming using the Java API for XML Parsing (JAXP) in all the examples in this chapter.
The important JAXP classes and interfaces are summarized here:
org.xml.sax.helpers.DefaultHandlerThis class is the default base class for SAX2 event handlers. It serves as a convenience class, which means that it provides a default handler for any events that you don't want to handle. When writing applications, you can extend this class and override the parts of the interface that you want to control.
javax.xml.parsers.DocumentBuilderThis abstract class allows an application programmer to obtain a DOM version of an XML document.
javax.xml.parsers.DocumentBuilderFactoryThis class allows a programmer to get a handle to a concrete DocumentBuilder object.
org.w3c.dom.DocumentThis interface provides a reference to the entire XML document tree. Using this handle, the contents of the tree can be analyzed.
javax.xml.transform.TransformerAn instance of this abstract class provides us with a way to stream XML to a file.
javax.xml.transform.stream.StreamResultThis class receives a stream representing the XML that is to be put in a file.
org.w3c.dom.NodeThis interface provides references to the nodes in the document tree. As a reference, it can serve as a handle to any type of node.
The simplest way to create an XML document is to type it in using a text editor such as Notepad or vi. This is a useful approach while you are learning XML because it gives you time to think about each reserved character and its purpose.
Listing 3.1 is a simple XML document that illustrates several of the points that we have discussed.
All the code in this chapter can be found on the Sams Publishing Web site at http://www.samspublishing.com.
Listing 3.1 A Simple XML File Representing a Request for a Cruise Ticket
<?xml version='1.0' encoding='utf-8' standalone='yes' ?> <!--This XML document represents a request for a cruise ticket--> <!DOCTYPE ticketRequest SYSTEM "ticketRequest.dtd"> <ticketRequest> <customer custID="10003" > <lastName>Carter</lastName> <firstName>Joseph</firstName> </customer> <cruise cruiseID="3004"> <destination>Hawaii</destination> <port>Honolulu</port> <sailing>7/7/2001</sailing> <numberOfTickets>5</numberOfTickets> <isCommissionable/> </cruise> </ticketRequest>
Let's take this example apart and figure out what each element in the file means. The first line is
<?xml version='1.0' encoding='utf-8' standalone='yes' ?>
This line starts with a <? character sequence. This line allows the parser or any other software that is looking at the file to detect that it is an XML file that is governed by the rules of version 1.0. This will become important as new versions are introduced. The encoding attribute tells us that the character set used in the document is utf-8, commonly known as text. The keyword standalone is set to yes, which indicates that this document is self-contained and references no other files.
The next line begins with <!--. This character sequence tells the processing program that this is a comment.
<!--This XML document represents a request for a cruise ticket-->
The drawback to this is the use of the comments to contain information that is processing-oriented. We see this in the very next line in the file:
<!DOCTYPE ticketRequest SYSTEM "ticketRequest.dtd">
This line associates the document with a more detailed definition of correctness called the Document Type Definition (DTD). We will see the DTD for this example in the next section. The word SYSTEM tells us to look for a file on our local file system relative to the location of this document. In this case, the DTD file, ticketRequest.dtd, would be in the same directory as this file. The word SYSTEM, if it appears here, tells the XML parser that the following string in quotes can be used to load the DTD. This can be used to provide a hyperlink to a DTD using a URL.
The next line is the root element of the document. You are only allowed to define one root element per XML document. The root element contains all other elements in the XML file. The <ticketRequest> form tells us that this is a user-defined tag with the name ticketRequest and that ticketRequest is in the default namespace. The tools that manipulate the XML document are oblivious to the fact that this is a request for tickets. Extracting meaningful information from the document is the job of the XML document consumer that we will create later in this chapter. All that the XML tools are able to do is notice whether the rules of XML are being followed and process the document according to those rules.
Next, we see another tag named customer. The format is a little different, though, because we see extra characters before the > appears.
<customer custID="10003" >
The extra characters represent an attribute of the customer tag. Defining an attribute is one way of associating information with a tag name. The syntax is formed by simply placing the name of the attribute next to the name of the tag followed by = and a quoted string. Even if a number appears in an attribute, it still appears inside quotes. The > tells us that there are no more attributes for this tag.
The following line is associated with the customer tag because it appears before the closing tag for customer. Any element that appears before the closing tag is considered a nested element. Nested elements are associated logically with the enclosing element. This means that they will be considered part of the enclosing tag by the parser. In this case, custID is considered the custID of the ticketRequest.
There is another nested element in customer: the firstName element. We know that the firstName element is not nested inside the lastName element because the closing tag for lastName, </lastName> has already appeared in the file.
Finally, we come to the closing tag for customer. This tells us that there will not be any more attributes or nested elements defined for customer.
The next line creates a new element called cruise. It also has a cruiseID attribute defined:
There are several nested elements inside the cruise tag. Each of them provides some detail about the cruise: where it is going, when it departs, and so on.
<destination>Hawaii</destination> <port>Honolulu</port> <sailing>7/7/2001</sailing> <numberOfTickets>5</numberOfTickets>
The following syntax is somewhat different. This character sequence represents an empty tag. An empty tag is one that will never have a subtree of nested elements. The fact that it is present communicates information to the document's consumer. When the consuming program sees this tag, it will know to pay a commission to a travel agent for booking this cruise. If the cruise has been booked online, this tag will be left off, indicating to the consuming program that no commission need be paid.
Next, the closing tag for cruise appears. After this line we could create another element if we chose to do so. This is because we have still not closed the root element, ticketRequest. However, we can only create elements that are allowed by the DTD that was referenced in the DOCTYPE declaration.
Finally, we close ticketRequest.
This line must be the last one in the document. If you attempt to create another tag after this, it will be a second root element and that is not allowed. Each XML document is a representation of exactly one root tag.
At this point, we have created a file that contains a ticketRequest in a special format. In essence, we have a little piece of persistent storage in character form that can be transmitted anywhere a text file can go. This is the central idea behind XML.