Home > Articles > Web Services > XML

Seven Steps to XML Mastery, Step 4: Parsing and Processing XML (Part 1 of 2)

  • Print
  • + Share This
In this fourth step to XML mastery, Frank Coyle starts us into the world of parsing technology with a look at the major parsing models: DOM, SAX, and StAX (a newcomer on the block). With some parsing technology under your belt, you can programmatically extract, modify, and even create XML - and it's actually much less complicated than it sounds.
For more information about this series, start by reading Frank Coyle's introduction, Seven Steps to XML Mastery: About This Series.
From the author of

Now it’s time to move to step 4 in our series and look at options for working with XML at a programming level. For a company like ZwiftBooks, building a corporate infrastructure around XML implies being able to move XML code into and out of programs seamlessly. This means extracting, modifying, and creating XML by using an XML parser. In this article, we’ll look at how ZwiftBooks can utilize XML parsing technology to integrate with an existing warehouse alert program.

Event Versus Tree Parsing

XML parsers fall into two categories:

Figure 1 illustrates the two major families of parsers for programmatically working with XML. Both event-based parsers and tree-based parsers take an XML document as input, but the two types of parsers treat that XML very differently.

Figure 1

Figure 1 Event versus tree parsing for XML documents.

Event-Based APIs

An event-based API reports parsing events to your application through the use of callbacks. As the XML streams into the parser, your handler is called as the parser encounters events of interest—start of document, start of element, end of element, and end of document (to name a few). Writing a SAX or StAX application means writing handlers that react when an element or attribute of interest is encountered in the XML.

Tree-Based APIs

A main tree-based API such as the W3C’s DOM maps an XML document into an internal tree structure, providing programmatic interfaces for navigating that tree. Methods are available to determine child and parent elements of nodes as well as to extract the content of elements of attributes. With DOM, it’s also possible to modify the tree and thus create new XML.

Choosing a Parser

The choice of event versus tree parser depends on the application requirements:

  • Event-based parsers are good for extracting an element or attribute from some XML and reacting to it in some way. Since event parsers look at only one small part of an XML document at a time, you can parse very large documents. Even documents in the terabyte range can be handled by a SAX or StAX parser.
  • Tree-based APIs build a navigable internal representation of a document. This approach is useful for a wide range of applications, but has a heavy impact on system resources—especially with large documents or special data-modeling requirements. For example, building a DOM tree, mapping it onto a new data structure, and discarding the original is typically not worth the effort. However, if data context is important, DOM is the way to go.
  • + Share This
  • 🔖 Save To Your Account