Home > Articles > Web Services > XML

Parsing an XML Document

  • Print
  • + Share This
Discover how to abstract data from an XML document with the use of some popular parsers.
This chapter is from the book

This chapter is from the book

The whole is more than the sum of its parts.

—Aristotle

Parsing is the act of reading an XML document to use the data in another application.

Parsing is a way to formulate and compose XML so that it can be accessible for use. Many interesting mechanisms can be developed to parse an XML document. The simplest one would be the plain string parser, which would treat the XML document as string content and return element values by sensing the presence of < and /> within that string.

An application might have varied usage for the data in the document. For example, some applications might use the XML document to present an explorer-like tree view so that the user could navigate through the system. Some other applications might use the XML document data values to plot a 2D graphic onscreen, representing a top view plan.

Each application has a specific need for interpreting the XML data stream and hence has a specific parsing mechanism associated with it. It is not practical to write a new parser for every application, however. Besides the amount of effort, in terms of achieving efficient parsing by using specialized algorithms and techniques, it is not efficient to do so.

Hence, we have generic parsers for parsing the document and feeding it to our applications.

Popular Parsers

As discussed before, two popular mechanisms representing a parser are DOM and SAX. SAX cannot be a representation of an XML document, but still it can be said to be a way to parse it. A parser is usually any utility that allows you to reach the respective element by linking through a parent-child relationship.

DOM presents a hierarchical way to represent the elements in an XML document and access them like a tree. The user programmatically accesses them using simple, implied statements.

SAX opens innumerable possibilities for using the XML document. As each element, or node occurrence, is parsed and triggered as a resulting event in the application, it can be interpreted in the way the application requires. This increases the representation flexibility for an XML document.

DOM ties the parsed document to a tree-like view, whereas SAX is used to create your own custom views: Both are useful.

As seen in most of our real-world applications, the need to represent data in tree-like hierarchical fashion is a major one. Hence, DOM is a useful and efficient way to do this.

As a guideline, it is advisable to use DOM wherever a tree-like memory image of the XML data is required. SAX, on the other hand, is better wherever a special representation method is to be used, being created within applications.

Between DOM and SAX, there are many parser types available from different sources. Some of the most popular parsers in use today are as follows:

  • Megginson—Java based SAX parser

  • Aelfred—Java based SAX parser

  • IBM's DOM and SAX parsers

  • Microsoft's DOM parser

An extensive list of XML tools and parsers is found at

http://www.garshol.priv.no/download/xmltools/cat_ix.html.

  • + Share This
  • 🔖 Save To Your Account