Home > Articles > Web Services > XML

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

18.4 -MSXML Fundamentals

DOM and SAX were mentioned in Chapter 2. DOM Level 2 and SAX2 are both standard APIs for parsing XML, and MSXML supports both standards. In fact, it is possible to use both the DOM and SAX with MSXML to validate XML instances.

18.4.1 Using MSXML from Visual Basic

The first step in using MSXML in Visual Basic is referencing the MSXML components. Like any other COM components you need to use from Visual Basic, you can add a reference to MSXML from your project file. Inside the Visual Basic IDE, you use the Projects menu to add or remove components of the project. One of these project components is a reference to a library. Selecting the Project menu and then the References item from that menu launches the References dialog box. This dialog box provides a checklist of all COM-type libraries on your machine. Checking one of these libraries, such as MSXML 4.0, includes the type information about those components in your project and enables you to create instances of those components and refer to them through early binding. Figure 18.2 shows the References dialog box with MSXML 4.0 loaded.

Figure 18.2 FIGURE 18.2 Referencing MSXML 4.0 from Visual Basic.


Throughout the chapter, it is assumed that the Visual Basic examples are included in a project that has referenced MSXML 4.0. Any of the stand-alone code samples listed will function in any Visual Basic project that has referenced the MSXML 4.0 components.

18.4.2 Using the DOM

When you are using the MSXML implementation of the DOM Level 2 feature set, the component that represents the DOM Document node is a DOMDocument40 component. Through DOMDocument40, we have complete access to the contents of an XML document.

18.4.2.1 DOMDocument40

DOMDocument40 is the starting point for using the MSXML DOM implementation. The '40' suffix of the component indicates its version. Earlier versions of MSXML included DOMDocument components, and it is possible to declare and use DOMDocument components if you are not concerned with the version of the component you will be using. Because the behavior and feature set of the DOMDocument component has grown over each of the versions, it is often necessary to specifically reference the version you need to guarantee that the functionality you need is available.

To work with a DOMDocument40 component, you need to instantiate an instance of DOMDocument40. Listing 18.1 shows the code required in Visual Basic to create an instance of a DOMDocument40 component if the reference to the MSXML 4.0 library has been added.

LISTING 18.1 Creating a DOMDocument40 (early binding)

Dim doc As DOMDocument40
Set doc = New DOMDocument40

If you are working from a scripting language or other language that only provides for late binding to COM objects, you can create the DOMDocument40 instance by using its PROGID, its user-friendly object identifier. The PROGID for the DOMDocument40 component is 'MSXML2.DOMDocument.4.0'. An example of late binding in VBScript is shown in Listing 18.2.

LISTING 18.2 Creating a DOMDocument40 (late binding for scripting)

Dim doc
Set doc = CreateObject("MSXML2.DOMDocument.4.0")

For the purposes of this chapter, assume that we are working inside the Visual Basic or similar IDE and have access to early binding to the MSXML components. This is just to simplify the sample code. The only difference between the two sets of code would be the instantiation code shown in Listings 18.1 and 18.2. As with any COM object, you gain performance at runtime through the use of early binding to the MSXML library.

PROGIDs can be used to create instances of any of the MSXML components. If you need to create an object instance from a scripting language such as VBScript or ECMAScript, use the PROGID of the object. The PROGIDs will follow the same form:

MSXML2.<object name>.4.0

For example, the XMLSchemaCache40 object has a PROGID of 'MSXML2.XMLSchemaCache.4.0'.

18.4.2.2 Reading XML with DOMDocument40

Using DOMDocument40, we can load an XML document and examine its contents after they are loaded into the DOM tree. Two methods of the DOMDocument40 can be used to load an XML document: load and loadXML. The load method, shown in Listing 18.3, accepts a URL that points to an XML document file. Alternatively, the loadXML method accepts the XML content directly, either as a string or as an IStream.

LISTING 18.3 Loading an XML Document Using DOMDocument40

Dim doc As DOMDocument40
Set doc = New DOMDocument40

doc.async = False
doc.validateOnParse = False

If (doc.Load("c:\test\address.xml")) Then  
    MsgBox "Document is well-formed"
Else
    Dim docError As IXMLDOMParseError  
    Set docError = doc.parseError
    MsgBox docError.reason, vbCritical
End If

Listing 18.3 shows two properties of the DOMDocument40 component that greatly impact the parsing behavior. The first is the async property, which is a Boolean flag indicating whether the document should be loaded synchronously or asynchronously. In the preceding code, we have set it to FALSE so the code will block on the Load method until the document is fully parsed or an error occurs. The default value for this property is TRUE, in which case the application must wait until the DOMDocument40 fires an onreadystatechange event or until the readystate property has changed to indicate a successful load.

The second property is validateOnParse, which is also a Boolean flag. The validateOnParse property defines the behavior you might expect; when TRUE, the parser attempts to validate the document against any XML schemas or XDR. When FALSE, the parser only verifies that the document is well-formed XML. The default value for this property is also TRUE.

The async and validateOnParse properties are not part of the DOM Level 2 feature set; they are specific to the MSXML implementation.

Once we have a DOM tree that can be traversed, which occurs after a successful load, we can use other MSXML components to perform a traversal.

18.4.2.3 -DOM Parsing Errors

Whenever the DOMDocument40 is used to parse an XML document, there is always a chance of error. The variety of errors that could be experienced during parsing varies from the obvious (not well-formed XML) to the complicated (the document did not conform to one of the associated XML schemas). To understand what went wrong during the parsing process and try to rectify it, we need to check the error generated by the parser.

The DOMDocument interface provides access to error information after parsing. Error information is provided through another, separate interface called IXMLDOMParseError. This interface returns information about the error type, the reason, and the location in the document where the error occurred. If validation were to fail, this information would be provided through the parseError property of the DOMDocument interface. The sample code that reads an XML document using DOMDocument40 accesses the parseError property to determine whether or not the document was loaded successfully.

18.4.3 -Using SAX2

The SAX and DOM parsers take a very different approach to working with an XML document. When working with the DOM, we load the XML document using a single object and then examine the tree that is built from the document contents. When working with SAX, we are notified through events whenever the parser encounters a particular element or a particular action occurs. Applications working with the SAX parser must implement handlers for each of these events and connect them to the parser.

Because MSXML is a COM-based API, the notification that comes from the MSXML SAX implementation comes through COM. This means that to create a handler, we must implement the COM interfaces that MSXML expects a SAX handler to implement.

Table 18.1 lists the three handler interfaces currently supported by the MSXML 4.0 SAX implementation, with the types of notifications they receive.

TABLE 18.1 SAX Handler Interfaces

Interface

Type of Notification

IVBSAXContentHandler

Document and elements

IVBSAXDeclHandler

DTD declarations

IVBSAXDTDHandler

DTD-related events

IVBSAXErrorHandler

Errors and warnings

IVBSAXLexicalHandler

-Comments and CDATA

IMXSchemaDeclHandler

XML schema declarations


Any handler component can implement as many of these interfaces as needed, and that handler will receive notification about all events related to that category.

18.4.3.1 SAXXMLReader40

The SAXXMLReader40 component is responsible for parsing an XML document and triggering the notifications to the SAX handlers that have registered with it. Triggering the parsing of an XML document takes about as much code as using the DOM and DOMDocument40. The major difference is that in addition to this code, the application must also create any handlers it needs.

18.4.3.2 Reading XML with SAXXMLReader40

The most basic use of the SAX handler would be to sink the events related to content, so a handler interested in those events would need to implement the IVBSAXContentHandler interface. After that handler instance exists, it can be attached to the SAXXMLReader40 component and the parsing can occur.

Listing 18.4 defines a Visual Basic class whose instances function as content handlers. This class is defined as SAXContent, and it implements the IVBSAXContentHandler interface. Whenever an element is parsed, the class checks the local name and conditionally outputs the qualified name of the element by using the Debug.Print statement, the equivalent of a trace statement in other languages.

The code for the handler has a number of empty methods; in fact, it is almost completely empty. The methods are necessary, however, because to implement an interface, you must implement all its methods, even if they are just stubbed out.

LISTING 18.4 Content Handler Class

'SAXContent.cls
Implements IVBSAXContentHandler

Private Sub IVBSAXContentHandler_characters( _
 strChars As String)

End Sub

Private Property Set IVBSAXContentHandler_documentLocator( _
 ByVal RHS As MSXML2.IVBSAXLocator)

End Property

Private Sub IVBSAXContentHandler_endDocument()

End Sub

Private Sub IVBSAXContentHandler_endElement(_
 strNamespaceURI As String, strLocalName As String, _
 strQName As String)

End Sub

Private Sub IVBSAXContentHandler_endPrefixMapping(_ strPrefix As String)
End Sub

Private Sub IVBSAXContentHandler_ignorableWhitespace(_ strChars As String)
End Sub
Private Sub IVBSAXContentHandler_processingInstruction(_
 strTarget As String, strData As String)
End Sub

Private Sub IVBSAXContentHandler_skippedEntity(_
 strName As String)
End Sub
Private Sub IVBSAXContentHandler_startDocument()
End Sub
Private Sub IVBSAXContentHandler_startElement(_
 strNamespaceURI As String, strLocalName As String, _
 strQName As String, _
 ByVal oAttributes As MSXML2.IVBSAXAttributes)

    If strLocalName = "businessCustomer" Then
        Debug.Print "element found: " & strQName
    End If

End Sub

Private Sub IVBSAXContentHandler_startPrefixMapping(_
 strPrefix As String, strURI As String)

End Sub

To use the handler, you need to instantiate an instance of the reader. The SAXXMLReader40, when instantiated, is used to parse the document. The handlers are attached to the reader, using the appropriate property. Several properties of the SAXXMLReader40 are used to attach handlers, as listed in Table 18.2.

Table 18.2 SAXXMLReader40 Handler Properties

Property

Related Handler Interface

contentHandler

IVBSAXContentHandler

dtdHandler

IVBSAXDTDHandler

errorHandler

IVBSAXErrorHandler


The code in Listing 18.5 shows how to use your handler class, SAXContent, and SAXXMLReader40 to read the document. The parseURL method of the reader accepts a path to a local XML document file just as the load method of the DOMDocument40 did in Listing 18.3.

LISTING 18.5 Using SAXXMLReader40 with the Content Handler

Dim sax As SAXXMLReader40
Set sax = New SAXXMLReader40  
Set sax.contentHandler = New SAXContent
sax.parseURL "c:\temp\address.xml"

Unlike with the DOM, you do not need to traverse a tree to find what you are looking for. The call to parseURL begins the parsing; after that, it is up to the event handlers to get the information they need. The SAXXMLReader40 component informs you via a callback to your handler interface whenever a particular event occurs. When loading the thematic address.xml document, the output generated by the code appears as shown in Listing 18.6.

LISTING 18.6 Debug Output of the Content Handler from address.xml

element found: businessCustomer
element found: businessCustomer
element found: businessCustomer
element found: businessCustomer

18.4.3.3. SAXXMLReader40 Configuration

In SAX, there are defined procedures for modifying the XMLReader component. These methods allow the application to set two types of values of the XMLReader: properties and features. Properties are named values that can be set on the reader, and features are Boolean properties. MSXML follows the SAX standard and implements these methods for the SAXXMLReader40.

The XMLReader properties and features are accessed by four methods: getFeature, putFeature, getProperty, and putProperty. SAX defines a set of standard features and properties to be implemented, but these are not the only ones that can be used. This approach means a particular SAX implementation can define its own properties and features that may be proprietary and the standard can grow over time without a change to the interface for every new configuration setting. For example, handlers for content or errors have fixed COM properties that are part of the reader, whereas declaration handlers need to use the getProperty and putProperty to work with the reader. Listing 18.7 illustrates how to set the declaration handler of the SAXXMLReader40 using the putProperty method.

LISTING 18.7 Using putProperty for XMLReader Configuration

Dim sax As SAXXMLReader40
Set sax = New SAXXMLReader40

' Assume we have a declHandler that implements
' IVBSAXDeclHandler 
sax.putProperty _
 "http://xml.org/sax/properties/declaration-handler", _
 declHandler

Whenever you use the SAXXMLReader40 to validate against a specific XML schema, you must use the configuration methods discussed here.

18.4.3.4 -SAX2 Parsing Errors

To handle parsing errors in a SAX application, the application must implement a handler for errors that are fired by the SAX parser. As stated earlier, components that are MSXML SAX error handlers implement the IVBSAXErrorHandler interface. If your application needs to know when the validation of an XML document fails, a component must implement IVBSAXErrorHandler and include code in the fatalerror handler to respond to the failed validation. The XML Schema Tree example at the end of this chapter uses SAX2 and the error handler interface to validate an XML document against an XML schema.

  • + Share This
  • 🔖 Save To Your Account