Home > Articles

  • Print
  • + Share This
This chapter is from the book

Event-Based vs. Tree-Based Parsing

We will cover tree-based and event-based parsing in some depth when we cover SAX and DOM in chapters 7 and 8, respectively. For now, an overview should be sufficient.

Event-Based Parsing

Event-based parsers (SAX) provide a data-centric view of XML. When an element is encountered, the idea is to process it and then forget about it. The event-based parser returns the element, its list of attributes, and the content. This is more efficient for many types of applications, especially searches. It requires less code and less memory since there is no need to build a large tree in memory as you are scanning for a particular element, attribute, and/or content sequence in an XML document.

Tree-Based Parsing

On the other hand, tree-based parsers (DOM) provide a document-centric view of XML. In tree-based parsing, an in-memory tree is created for the entire document, which is extremely memory-intensive for large documents. All elements and attributes are available at once, but not until the entire document has been parsed. This technique is useful if you need to navigate around the document and perhaps change various document chunks, which is precisely why it is useful for the Document Object Model (DOM), the aim of which is to manipulate documents via scripting languages or Java.

David Megginson, the main force behind Simple API for XML (SAX), contrasts these two approaches in "Events vs. Trees" on the SAX site (http://www.saxproject.org/?selected=event). The W3C presents its viewpoint in an item from the DOM FAQ, "What is the relationship between the DOM and SAX?" (http://www.w3.org/DOM/faq#SAXandDOM).

  • + Share This
  • 🔖 Save To Your Account