Home > Articles > Programming > Java

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

Parsing with the Simple API for XML (SAX)

There are three ways of getting at the data contained in an XML document. The first is to write your own parser using a StringTokenizer object to navigate through the code. This approach has the advantage of being conceptually simple and thus it has a short learning curve. You start at the beginning of the file and look for certain tag combinations. When you find the tag that you want, you remove the data, use it or store it in a variable, and move on to the next tag. This works well for very simple XML documents, but it can become tedious to code when the number of tags gets large.

The second approach is to use a parser to create a Document Object Model, or DOM. A DOM is an in-memory treelike model of the document. The parser provides a number of methods to traverse the tree. You use these methods to find the data that you want, and having found it, you use the data or store it in a local variable. There are advantages and disadvantages to this approach that we will discuss after we have covered some coding examples.

The third approach is the one that we will use in this section. It is called the Simple API for XML (SAX). SAX is simple to use once you become familiar with it. It is based on an event-driven model in which the parser generates events that you write handlers for. This works fine in practice, but it is not especially intuitive.

NOTE

The SAX examples in this chapter are based on the SAX 2.0 standard. The J2EE JAR file may contain a SAX 1.x parser. The WebLogic installation comes with a SAX 2.0 parser in the weblogic.jar file. See the readme in the code zip file for this chapter for instructions on how to place the correct parser in your classpath.

Rather than provide a long-winded explanation of how SAX works, we'll work through an example that uses SAX to parse the ticketRequest.xml file that we created earlier. We need a class to represent a travel agent's request to a cruise line to book a cruise on behalf of one of his customers. We will use the TicketRequest2 object that can be found in Appendix A, "Source Code Listings for Utility Programs Used in This Book." The code for this class is included with the code for this chapter on the Web site.

The program that processes the XML file is called TicketRequestParser. It converts an XML document into a TicketRequest2 object using a SAX2 parser. It is shown in Listing 3.3.

Listing 3.3 The TicketRequestParser

//object using the SAX parser. 
/*
 * TicketRequestParser.java
 *
 * Created on January 18, 2002, 3:01 PM
 */

package unleashed.ch3;

import unleashed.TicketRequest2;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import javax.xml.parsers.*;
import java.io.*;

/**
 *
 * @author Stephen Potts
 * @version
 */
public class TicketRequestParser extends DefaultHandler
{
  private CharArrayWriter buffer = new CharArrayWriter();
  static TicketRequest2 tr2;
  String custIDString;
  String cruiseIDString;
  
  public void startDocument() throws SAXException
  {
    System.out.println("startDocument() called");
  }
  
  public void endDocument() throws SAXException
  {
    System.out.println("endDocument() called");
  }
  
  public void startElement(String namespaceURI, String localName,
  String qName, Attributes attr) throws SAXException
  {
    
    for ( int i=0; i < attr.getLength(); i++)
    {
      String attrName = attr.getQName(i);
      if (attrName.equals("custID"))
      {
        custIDString = attr.getValue(i);
      }
      if (attrName.equals("cruiseID"))
      {
        cruiseIDString = attr.getValue(i);
      }
    }
    buffer.reset();
  }
  
  public void endElement( String nameSpaceURI, String localName,
  String qName) throws SAXException
  {
    storeElementValue(qName);
  }
  
  public void characters( char[] ch, int start,
               int length) throws SAXException
  {
    buffer.write(ch, start, length);
  }
  
  private void storeElementValue(String elementName)
  {
    if (elementName.equals("ticketRequest"))
    {
    }
    
    if (elementName.equals("customer"))
    {
      tr2.setCustID(Integer.parseInt(custIDString));
    }
    
    if (elementName.equals("lastName"))
    {
      tr2.setLastName(buffer.toString());
    }
    
    if (elementName.equals("firstName"))
    {
       tr2.setFirstName(buffer.toString());
    }
    
    if (elementName.equals("cruise"))
    {
      tr2.setCruiseID(Integer.parseInt(cruiseIDString));
    }
    
    if (elementName.equals("destination"))
    {
      tr2.setDestination(buffer.toString());
    }
    
    if (elementName.equals("port"))
    {
      tr2.setPort(buffer.toString());
    }
    
    if (elementName.equals("sailing"))
    {
      tr2.setSailing(buffer.toString());
    }
    
    if (elementName.equals("numberOfTickets"))
    {
      String numberOfTicketsString = buffer.toString(); 
      int numberOfTickets =
              Integer.parseInt(numberOfTicketsString);
      tr2.setNumberOfTickets(numberOfTickets);
    }
    
    if (elementName.equals("isCommissionable"))
    {
      tr2.setCommissionable(true);
    }
  }
  
  public String toString()
  {
    return tr2.toString();
  }
  
  public static void main( String[] args)
  {

    System.out.println("TicketRequestParser main()");
    
    DefaultHandler trp = new TicketRequestParser();
    
    tr2 = new TicketRequest2();
    SAXParserFactory factory = SAXParserFactory.newInstance();
    try
    {
      SAXParser saxParser = factory.newSAXParser();
      saxParser.parse( new File(
       "c:/unleashed/ch3/ticketRequest.xml"),
             trp);
    }catch (Exception e)
    {
      System.out.println("Exception in main " + e);
    }
    System.out.println(trp);
  }
}

There are quite a few interesting lines of code in this example. The declaration of the class states that it extends DefaultHandler:

public class TicketRequestParser extends DefaultHandler

The org.xml.sax.helpers.DefaultHandler class is the default base class for SAX2 event handlers. It serves as a convenience class, which means that it provides a default handler for any events that you don't want to handle. When writing applications, you can extend this class and override the parts of the interface that you want to control.

The processing of text in SAX is done in character buffers. The easiest way to convert these characters to strings is to define a buffer that grows when data is written to it:

  private CharArrayWriter buffer = new CharArrayWriter();

There are five methods that we will override to process our document:

  • startDocument()—Called when the document is opened.

  • endDocument()—Called at the end of the document.

  • startElement()—Called when each opening tag is encountered.

  • endElement()—Called when each end tag is encountered.

  • characters()—Called whenever the parser encounters characters that it cannot identify as a type of tag or instruction. This method assumes that the characters are data of some sort.

The start and end of the documents are not very interesting in this application, so we simply do a println() to show that the application occurred. The elements are processed by the startElement() method and the endElement() method. startElement() does two things: It resets the buffer to empty it and processes the attributes, if any, of the element. In the case of custID, the value is kept in an attribute, not in a subelement:

    for ( int i=0; i < attr.getLength(); i++)
    {
      String attrName = attr.getLocalName(i);
      if (attrName.equals("custID"))
      {
        custIDString = attr.getValue(i);
      }
      if (attrName.equals("cruiseID"))
      {
        cruiseIDString = attr.getValue(i);
      }
    }

If a tag has attributes, they will be kept in a list that implements the org.xml.sax.Attributes interface. This list has methods to get the name and the value. These are stored in class level variables so that we can keep all the TicketRequest2 class updates in one method, making the example easier to follow.

The storeElementValue() accepts as a parameter the name of the element to be stored. It then copies what is in the buffer and passes this to the mutator method for that value in the TicketRequest2 object. Some data type conversion takes place to store the values in the int fields.

Notice that the attributes are retrieved from the class level variables and placed in the object when a closing tag is encountered for their element.

A slightly different processing takes place for the isCommissionable element. The presence of this element indicates that a commission will be paid on this booking. It has no value to store, so the code infers that true is the correct value and sets it accordingly.

The result is output using the toString() method of the TicketRequestParser class, which simply calls the toString() method of the TicketRequest2 class.

Running the TicketRequestParser's main() gives us the following results:

TicketRequestParser main()
startDocument() called
endDocument() called
------------------------------------------------
custID = 10003
lastName = Carter
firstName = Joseph
------------------------------------------------
cruiseID = 3004
destination = Hawaii
port = Honolulu
sailing = 7/7/2001
numberOfTickets = 5
isCommissionable = true
------------------------------------------------\

There are not a lot of surprises here. The TicketRequestParser is doing a lot of work, but its job is to translate an XML document that represents a specific object into that object. We will see later how this code can be used to create asynchronous processing of distributed objects.

  • + Share This
  • 🔖 Save To Your Account