Home > Articles

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

The Object Model

The DOM is a projection of the XML Infoset. The object model of the DOM represents the Infoset as a tree-structured graph of nodes. The DOM specifies several aspects of this graph, including the interfaces that must be supported by each node, the syntax/semantics of the each node interface, and the relationships between the different node types. The DOM does not, however, mandate how the underlying code is structured or what algorithms or data structures are used to maintain the internal form of the underlying information items.

Figure 2.4 shows the UML model of the DOM. The focal point of the DOM is the Node interface, which acts as the base interface for all node types. Table 2.3 shows the various node types and their corresponding Infoset information item where applicable. The fact that virtually everything is a node makes traversal code extremely uniform, as a standard set of methods is available no matter where one is in the object model. However, each node supports an extended interface type that exposes information item-specific functionality in a type-safe manner interface type that exposes information item-specific functionality in a type-safe manner.

Figure 2.4Figure 2.4. DOM Interfaces

Table 2.3. DOM Nodes and the Infoset

DOM Node Infoset Information Item
Document Document Information Item
DocumentFragment N/A
DocumentType Document Type Declaration Information Item
EntityReference Entity Start/End Marker Information Items
Element Element Information Item
Attr Attribute Information Item
ProcessingInstruction Processing Instruction Information Item
Comment Comment Information Item
Text Sequence of Character Information Items
CDATASection CDATA Start/End Marker Information Items
Entity Entity Information Item
Notation Notation Information Item

To see how the DOM object model reflects the Infoset, consider the following serialized XML document:

<?xml version="1.0"?>
<?order alpha ascending?>
<art xmlns=http://www.art.org/schemas/art'>
  <period name="Renaissance">
    <artist>Leonardo da Vinci</artist>
  <!-- insert period here -->

Figure 2.5 shows what happens when this XML document is projected onto the DOM. Notice that the topmost node in the DOM structure corresponds to the document information item and is of type Document. The Document node has two child nodes that correspond to the document information item's [children] property: a ProcessingInstruction node7 and an Element node. The Element node is the distinguished document element and has two child nodes corresponding to the element information item's [children] property: one Element node and one Comment node. That Element node has three Element nodes as children, again corresponding to the [children] Infoset property.

Figure 2.5Figure 2.5. The DOM and art

As just described, there is a striking similarity between the node-based model of the DOM and the Infoset. Where the DOM's node-based model departs from the Infoset is in its treatment of character information items. The Infoset treats each character in element content as a distinct information item. This is reasonable for an abstract model, but the performance impact of using an object per character would render the DOM completely unusable. For that reason, the DOM aggregates adjacent character information items into a single node of type Text. It is also interesting to note that the nodeValue property of the parent Element nodes is always null. Rather, to access the character data [children] from an Element node, one must first access the Text node that is the child of the element. The nodeValue of that node will contain a string of characters reflecting the element content.

  • + Share This
  • 🔖 Save To Your Account