- Simple API For XML Version 2 (SAX2)
- Auxiliary SAX Interfaces
- SAX and I/O
- SAX Error Handling
- The Glue of SAX: XMLReader
- The Document Object Model
- The Object Model
- The DOM and Factories
- The Node Interface
- Parents and Children
- Nonhierarchical Nodes
- Text Nodes
- Element and Attribute Nodes
- Document, Document Type, and Entity Nodes
- Bulk Insertion Using Document Fragment
- DOM Error Handling
- Implementation vs Interface
- DOM Traversal
- Where Are We?
Document, Document Type, and Entity Nodes
The Document node represents a document information item in addition to acting as the factory for new nodes. The DocumentType node has no direct Infoset equivalent, but it is where the document information item's [entities] and [notations] properties are accessed. Like the Infoset [entities] property, the DocumentType interface's entities attribute never contains parameter entities. Unlike the Infoset [entities] property, the DocumentType interface's entities attribute does not contain the document entity. However, the DocumentType interface does expose the external identifier (and optionally the contents) of the external DTD subset.
The Document node has two related nodes that are given special status: One is the distinguished [children] node that represents the root element of the document; the other is the node that represents the document type declaration of the document. The root element of the document is exposed via the normal child node accessors as well as via the Document.documentElement attribute. The document type declaration node is not a child node and is only accessible via the Document.doctype attribute.
interface Document : Node { readonly attribute Element documentElement; readonly attribute DocumentType doctype; : : : }
The Document interface also provides two convenience methods for navigating to child Element nodes.
interface Document : Node { Element getElementById(in DOMString elementID); NodeList getElementsByTagNameNS(in DOMString namespaceURI, in DOMString localName); }
The Document.getElementById method finds the element that is uniquely identified by an ID attribute with the value specified as the method parameter. The Document.getElementsByTagNameNS method is identical to the same-named method exposed by the Element interface. The primary distinction is that because it appears on the parent of all elements in the document, it may return the root element of the document. Calling Element.getElementsByTagNameNS will never return the root element of the document, as it only returns descendant nodes.
The DocumentType node exposes two NamedNodeMaps, one for the document's [entities] and one for [notations]. These maps contain nodes that implement the Entity and Notation interfaces, respectively. Both interfaces expose the [system identifier] and [public identifier] Infoset properties. The Entity interface also exposes the [notation] property as a string. For both node/interface types, the [name] property is accessed via the generic Node.nodeName attribute. The actual content of the entity is exposed as child nodes.
The DOM takes an unorthodox approach to handling references to entities in element content. Rather than inject distinct nodes for the entity start and end marker information items, the DOM inserts an intermediate node between the parent element and the replacement content. This node implements the EntityReference interface and has child nodes that correspond to the replacement content. EntityReference nodes make it somewhat more complicated to process a DOM hierarchy since there may be EntityReference nodes sprinkled throughout. Because of this, some DOM implementations provide a mechanism for automatically expanding entity references in the DOM hierarchy to remove all EntityReference nodes. While this simplifies the programming model, it defeats possible implementation-specific optimizations that result from otherwise lazy evaluation (see Figure 2.14).
Figure 2.14. Entity references and the DOM