Home > Articles

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

Auxiliary SAX Interfaces

The ContentHandler interface models 100 percent of the Infoset needed by 95 percent of the world's XML applications. The needs of the remaining 5 percent are addressed by three auxiliary interfaces: LexicalHandler, DTDHandler, and DeclHandler. LexicalHandler models peripheral Infoset-isms such as <!CDATA[ boundaries; DTDHandler models notations and unparsed entities; and DeclHandler models markup and entity declarations from DTDs. If your application does not need to deal with these aspects of XML, feel free to ignore these interfaces (and this section of the chapter).

The LexicalHandler models Infoset information items that are not required for proper interpretation. None of these items are part of the Infoset core. Rather, they are of peripheral interest mainly for parties that wish to retain/reconstruct the original serialized form of a document. The following is the Java version of LexicalHandler:

package org.xml.sax.ext;
public interface LexicalHandler {
// signal beginning/end of DTD
  void startDTD(String name, String publicId,
                String systemId)
        throws SAXException;
  void endDTD() throws SAXException;
// signal beginning/end of general entity reference
  void startEntity(String name) throws SAXException;
  void endEntity(String name) throws SAXException;
// signal beginning/end of <![CDATA[ section
  void startCDATA() throws SAXException;
  void endCDATA() throws SAXException;
// signal presence of <!-- comment -->
  void comment(char ch[], int start, int length)
        throws SAXException;
}

The comment method is the easiest to grasp. The comment method corresponds to a comment information item and conveys the character data of the comment. Because SAX works at the Infoset level, the <!-- and --> delimiters are not delivered.

The startCDATA/endCDATA methods are used to indicate that the intervening ContentHandler.characters methods are contained in a <!CDATA[ section. For example, this Java code

void emit6(org.xml.sax.ContentHandler ch,
           org.xml.sax.LexicalHandler lh) {
  char[] ch = "John".toCharArray();
  ch.characters(ch, 0, ch.length);
  lh.startCDATA();
  ch.characters(ch, 0, ch.length);
  ch.characters(ch, 0, ch.length);
  lh.endCDATA();
  ch.characters(ch, 0, ch.length);
}

corresponds to the following serialized XML:

John<!CDATA[JohnJohn]]>John

Had the receiver not implemented LexicalHandler, this XML would have been indistinguishable from the following:

JohnJohnJohnJohn

In general, most applications prefer to ignore <!CDATA[ usage, but for applications looking to retain the original form of a serialized document, this particular detail is likely to be important.

The startEntity/endEntity methods are used to indicate that the intervening ContentHandler methods happened as the result of a parsed entity reference, not literal content. For example, the Java code

void emit7(org.xml.sax.ContentHandler ch,
           org.xml.sax.LexicalHandler lh) {
  char[] ch = "John".toCharArray();
  ch.characters(ch, 0, ch.length);
  lh.startEntity('jj');
  ch.characters(ch, 0, ch.length);
  ch.characters(ch, 0, ch.length);
  lh.endEntity('jj');
  ch.characters(ch, 0, ch.length);
}

corresponds to the following serialized XML

John&jj;John

assuming the following entity declaration had appeared elsewhere

<!ENTITY jj "JohnJohn">

Had the receiver not implemented LexicalHandler, this XML would have been indistinguishable from the following

JohnJohnJohnJohn

which again is typically what almost all applications care about. Like startCDATA/endCDATA, these methods exist primarily to retain fidelity with serialized XML documents, not for mainstream XML applications.

Finally, the document type (DOCTYPE) declaration is modeled by the startDTD/endDTD methods. These methods convey the Qname of the expected document element and the system and public identifiers of the external DTD subset. Consider the following Java code:

void emit8(org.xml.sax.LexicalHandler handler) {
  handler.startDTD("foo:bar", "-//DevelopMentor//fb//EN",
                   "http://foo.bar.com");
  handler.endDTD();
}

This code corresponds to the following serialized DOCTYPE declaration:

<!DOCTYPE foo:bar PUBLIC '-//DevelopMentor//fb//EN'
                          'http://foo.bar.com' >

Note that there are no LexicalHandler methods that model the contents of the document type definition. Rather, that is the role played by DTDHandler and DeclHandler.

SAX uses two interfaces to model the contents of a DTD, largely due to the history of the Infoset. At the time of SAX2's development, the Infoset did not address parsed entity declarations, attribute list declarations, or element declarations. However, the Infoset's document information item has always had two core properties ([notations] and [entities]) that are not addressed by ContentHandler. These two properties are modeled by the DTDHandler interface. The following is the Java version of DTDHandler:

package org.xml.sax;
public interface DTDHandler {
  void notationDecl(String name, String publicId,
                                 String systemId)
           throws SAXException;
  void unparsedEntityDecl(String name, String publicId,
                     String systemId, String notationName)
           throws SAXException;
}

Consider the following Java code fragment:

void emit5(org.xml.sax.DTDHandler handler) {
  handler.notationDecl("wav", "-//DevelopMentor//fb//EN",
                       null);
  handler.notationDecl("au", null, "http://mp9.com/au");
  handler.unparsedEntityDecl("woosh", "",
                             "http://foo.com", "wav");
  handler.unparsedEntityDecl("wooosh", "-//DM//foooo//EN",
                             "http://fooo.com", "au");
}

This corresponds to the following DTD declarations:

<!NOTATION wav PUBLIC '-//DevelopMentor//fb//EN' >
<!NOTATION au  SYSTEM 'http://mp9.com/au' >
<!ENTITY woosh SYSTEM 'http://foo.com' NDATA wav>
<!ENTITY wooosh PUBLIC '-//DM//foooo//EN'
                       'http://fooo.com' NDATA au>

It is important to note that the caller is responsible for fully resolving the URI in the system identifier prior to invoking notationDecl or unparsedEntityDecl.

In addition to notation and unparsed entity declarations, SAX supports the remaining DTD-isms via the DeclHandler interface. The following is the Java version of DeclHandler:

package org.xml.sax.ext;
public interface DeclHandler {
// signal an ELEMENT declaration
  void elementDecl(String name, String model)
          throws SAXException;
// signal one attribute from an ATTLIST declaration
  void attributeDecl(String eName, String aName,
             String type, String valueDefault, String value)
          throws SAXException;
// signal an internal parsed general ENTITY declaration
  void internalEntityDecl(String name, String value)
          throws SAXException;
// signal an external parsed general ENTITY declaration
  void externalEntityDecl(String name, String publicId,
                          String systemId)
          throws SAXException;
}

The four methods of DeclHandler correspond to a DTD-style element declaration, attribute list declaration, internal parsed entity declaration, and external parsed entity declaration. Of these four methods, only attributeDecl warrants any real discussion, as its mapping to a DTD-style ATTLIST declaration is not a one-to-one mapping. Rather, an attributeDecl invocation corresponds to only one attribute from an attribute list declaration. For example, consider the following serialized attribute list declaration:

<!ATTLIST foo bar   CDATA #REQUIRED
              baz   NMTOKEN "foobar"
              quux  IDREF #IMPLIED
              quuux IDREFS #FIXED "hey joe"
>

This would correspond to the following Java code:

void emit9(org.xml.sax.DeclHandler handler) {
  handler.attributeDecl("foo", "bar",
                        "CDATA", "#REQUIRED", null);
  handler.attributeDecl("foo", "baz",
                        "NMTOKEN", null, "foobar");
  handler.attributeDecl("foo", "quux",
                        "IDREF", "#IMPLIED", null);
  handler.attributeDecl("foo", "quuux",
                        "IDREFS", "#FIXED", "hey joe");
}

Assuming that the receiver of the document implements both DeclHandler and LexicalHandler, all calls to DeclHandler methods must occur after a call to LexicalHandler.startDTD and prior to a call to LexicalHandler.endDTD.

  • + Share This
  • 🔖 Save To Your Account