Home > Articles > Web Services > XML

This chapter is from the book

This chapter is from the book

18.4 -MSXML Fundamentals

DOM and SAX were mentioned in Chapter 2. DOM Level 2 and SAX2 are both standard APIs for parsing XML, and MSXML supports both standards. In fact, it is possible to use both the DOM and SAX with MSXML to validate XML instances.

18.4.1 Using MSXML from Visual Basic

The first step in using MSXML in Visual Basic is referencing the MSXML components. Like any other COM components you need to use from Visual Basic, you can add a reference to MSXML from your project file. Inside the Visual Basic IDE, you use the Projects menu to add or remove components of the project. One of these project components is a reference to a library. Selecting the Project menu and then the References item from that menu launches the References dialog box. This dialog box provides a checklist of all COM-type libraries on your machine. Checking one of these libraries, such as MSXML 4.0, includes the type information about those components in your project and enables you to create instances of those components and refer to them through early binding. Figure 18.2 shows the References dialog box with MSXML 4.0 loaded.

Figure 18.2 FIGURE 18.2 Referencing MSXML 4.0 from Visual Basic.


Throughout the chapter, it is assumed that the Visual Basic examples are included in a project that has referenced MSXML 4.0. Any of the stand-alone code samples listed will function in any Visual Basic project that has referenced the MSXML 4.0 components.

18.4.2 Using the DOM

When you are using the MSXML implementation of the DOM Level 2 feature set, the component that represents the DOM Document node is a DOMDocument40 component. Through DOMDocument40, we have complete access to the contents of an XML document.

18.4.2.1 DOMDocument40

DOMDocument40 is the starting point for using the MSXML DOM implementation. The '40' suffix of the component indicates its version. Earlier versions of MSXML included DOMDocument components, and it is possible to declare and use DOMDocument components if you are not concerned with the version of the component you will be using. Because the behavior and feature set of the DOMDocument component has grown over each of the versions, it is often necessary to specifically reference the version you need to guarantee that the functionality you need is available.

To work with a DOMDocument40 component, you need to instantiate an instance of DOMDocument40. Listing 18.1 shows the code required in Visual Basic to create an instance of a DOMDocument40 component if the reference to the MSXML 4.0 library has been added.

LISTING 18.1 Creating a DOMDocument40 (early binding)

Dim doc As DOMDocument40
Set doc = New DOMDocument40

If you are working from a scripting language or other language that only provides for late binding to COM objects, you can create the DOMDocument40 instance by using its PROGID, its user-friendly object identifier. The PROGID for the DOMDocument40 component is 'MSXML2.DOMDocument.4.0'. An example of late binding in VBScript is shown in Listing 18.2.

LISTING 18.2 Creating a DOMDocument40 (late binding for scripting)

Dim doc
Set doc = CreateObject("MSXML2.DOMDocument.4.0")

For the purposes of this chapter, assume that we are working inside the Visual Basic or similar IDE and have access to early binding to the MSXML components. This is just to simplify the sample code. The only difference between the two sets of code would be the instantiation code shown in Listings 18.1 and 18.2. As with any COM object, you gain performance at runtime through the use of early binding to the MSXML library.

PROGIDs can be used to create instances of any of the MSXML components. If you need to create an object instance from a scripting language such as VBScript or ECMAScript, use the PROGID of the object. The PROGIDs will follow the same form:

MSXML2.<object name>.4.0

For example, the XMLSchemaCache40 object has a PROGID of 'MSXML2.XMLSchemaCache.4.0'.

18.4.2.2 Reading XML with DOMDocument40

Using DOMDocument40, we can load an XML document and examine its contents after they are loaded into the DOM tree. Two methods of the DOMDocument40 can be used to load an XML document: load and loadXML. The load method, shown in Listing 18.3, accepts a URL that points to an XML document file. Alternatively, the loadXML method accepts the XML content directly, either as a string or as an IStream.

LISTING 18.3 Loading an XML Document Using DOMDocument40

Dim doc As DOMDocument40
Set doc = New DOMDocument40

doc.async = False
doc.validateOnParse = False

If (doc.Load("c:\test\address.xml")) Then  
    MsgBox "Document is well-formed"
Else
    Dim docError As IXMLDOMParseError  
    Set docError = doc.parseError
    MsgBox docError.reason, vbCritical
End If

Listing 18.3 shows two properties of the DOMDocument40 component that greatly impact the parsing behavior. The first is the async property, which is a Boolean flag indicating whether the document should be loaded synchronously or asynchronously. In the preceding code, we have set it to FALSE so the code will block on the Load method until the document is fully parsed or an error occurs. The default value for this property is TRUE, in which case the application must wait until the DOMDocument40 fires an onreadystatechange event or until the readystate property has changed to indicate a successful load.

The second property is validateOnParse, which is also a Boolean flag. The validateOnParse property defines the behavior you might expect; when TRUE, the parser attempts to validate the document against any XML schemas or XDR. When FALSE, the parser only verifies that the document is well-formed XML. The default value for this property is also TRUE.

The async and validateOnParse properties are not part of the DOM Level 2 feature set; they are specific to the MSXML implementation.

Once we have a DOM tree that can be traversed, which occurs after a successful load, we can use other MSXML components to perform a traversal.

18.4.2.3 -DOM Parsing Errors

Whenever the DOMDocument40 is used to parse an XML document, there is always a chance of error. The variety of errors that could be experienced during parsing varies from the obvious (not well-formed XML) to the complicated (the document did not conform to one of the associated XML schemas). To understand what went wrong during the parsing process and try to rectify it, we need to check the error generated by the parser.

The DOMDocument interface provides access to error information after parsing. Error information is provided through another, separate interface called IXMLDOMParseError. This interface returns information about the error type, the reason, and the location in the document where the error occurred. If validation were to fail, this information would be provided through the parseError property of the DOMDocument interface. The sample code that reads an XML document using DOMDocument40 accesses the parseError property to determine whether or not the document was loaded successfully.

18.4.3 -Using SAX2

The SAX and DOM parsers take a very different approach to working with an XML document. When working with the DOM, we load the XML document using a single object and then examine the tree that is built from the document contents. When working with SAX, we are notified through events whenever the parser encounters a particular element or a particular action occurs. Applications working with the SAX parser must implement handlers for each of these events and connect them to the parser.

Because MSXML is a COM-based API, the notification that comes from the MSXML SAX implementation comes through COM. This means that to create a handler, we must implement the COM interfaces that MSXML expects a SAX handler to implement.

Table 18.1 lists the three handler interfaces currently supported by the MSXML 4.0 SAX implementation, with the types of notifications they receive.

TABLE 18.1 SAX Handler Interfaces

Interface

Type of Notification

IVBSAXContentHandler

Document and elements

IVBSAXDeclHandler

DTD declarations

IVBSAXDTDHandler

DTD-related events

IVBSAXErrorHandler

Errors and warnings

IVBSAXLexicalHandler

-Comments and CDATA

IMXSchemaDeclHandler

XML schema declarations


Any handler component can implement as many of these interfaces as needed, and that handler will receive notification about all events related to that category.

18.4.3.1 SAXXMLReader40

The SAXXMLReader40 component is responsible for parsing an XML document and triggering the notifications to the SAX handlers that have registered with it. Triggering the parsing of an XML document takes about as much code as using the DOM and DOMDocument40. The major difference is that in addition to this code, the application must also create any handlers it needs.

18.4.3.2 Reading XML with SAXXMLReader40

The most basic use of the SAX handler would be to sink the events related to content, so a handler interested in those events would need to implement the IVBSAXContentHandler interface. After that handler instance exists, it can be attached to the SAXXMLReader40 component and the parsing can occur.

Listing 18.4 defines a Visual Basic class whose instances function as content handlers. This class is defined as SAXContent, and it implements the IVBSAXContentHandler interface. Whenever an element is parsed, the class checks the local name and conditionally outputs the qualified name of the element by using the Debug.Print statement, the equivalent of a trace statement in other languages.

The code for the handler has a number of empty methods; in fact, it is almost completely empty. The methods are necessary, however, because to implement an interface, you must implement all its methods, even if they are just stubbed out.

LISTING 18.4 Content Handler Class

'SAXContent.cls
Implements IVBSAXContentHandler

Private Sub IVBSAXContentHandler_characters( _
 strChars As String)

End Sub

Private Property Set IVBSAXContentHandler_documentLocator( _
 ByVal RHS As MSXML2.IVBSAXLocator)

End Property

Private Sub IVBSAXContentHandler_endDocument()

End Sub

Private Sub IVBSAXContentHandler_endElement(_
 strNamespaceURI As String, strLocalName As String, _
 strQName As String)

End Sub

Private Sub IVBSAXContentHandler_endPrefixMapping(_ strPrefix As String)
End Sub

Private Sub IVBSAXContentHandler_ignorableWhitespace(_ strChars As String)
End Sub
Private Sub IVBSAXContentHandler_processingInstruction(_
 strTarget As String, strData As String)
End Sub

Private Sub IVBSAXContentHandler_skippedEntity(_
 strName As String)
End Sub
Private Sub IVBSAXContentHandler_startDocument()
End Sub
Private Sub IVBSAXContentHandler_startElement(_
 strNamespaceURI As String, strLocalName As String, _
 strQName As String, _
 ByVal oAttributes As MSXML2.IVBSAXAttributes)

    If strLocalName = "businessCustomer" Then
        Debug.Print "element found: " & strQName
    End If

End Sub

Private Sub IVBSAXContentHandler_startPrefixMapping(_
 strPrefix As String, strURI As String)

End Sub

To use the handler, you need to instantiate an instance of the reader. The SAXXMLReader40, when instantiated, is used to parse the document. The handlers are attached to the reader, using the appropriate property. Several properties of the SAXXMLReader40 are used to attach handlers, as listed in Table 18.2.

Table 18.2 SAXXMLReader40 Handler Properties

Property

Related Handler Interface

contentHandler

IVBSAXContentHandler

dtdHandler

IVBSAXDTDHandler

errorHandler

IVBSAXErrorHandler


The code in Listing 18.5 shows how to use your handler class, SAXContent, and SAXXMLReader40 to read the document. The parseURL method of the reader accepts a path to a local XML document file just as the load method of the DOMDocument40 did in Listing 18.3.

LISTING 18.5 Using SAXXMLReader40 with the Content Handler

Dim sax As SAXXMLReader40
Set sax = New SAXXMLReader40  
Set sax.contentHandler = New SAXContent
sax.parseURL "c:\temp\address.xml"

Unlike with the DOM, you do not need to traverse a tree to find what you are looking for. The call to parseURL begins the parsing; after that, it is up to the event handlers to get the information they need. The SAXXMLReader40 component informs you via a callback to your handler interface whenever a particular event occurs. When loading the thematic address.xml document, the output generated by the code appears as shown in Listing 18.6.

LISTING 18.6 Debug Output of the Content Handler from address.xml

element found: businessCustomer
element found: businessCustomer
element found: businessCustomer
element found: businessCustomer

18.4.3.3. SAXXMLReader40 Configuration

In SAX, there are defined procedures for modifying the XMLReader component. These methods allow the application to set two types of values of the XMLReader: properties and features. Properties are named values that can be set on the reader, and features are Boolean properties. MSXML follows the SAX standard and implements these methods for the SAXXMLReader40.

The XMLReader properties and features are accessed by four methods: getFeature, putFeature, getProperty, and putProperty. SAX defines a set of standard features and properties to be implemented, but these are not the only ones that can be used. This approach means a particular SAX implementation can define its own properties and features that may be proprietary and the standard can grow over time without a change to the interface for every new configuration setting. For example, handlers for content or errors have fixed COM properties that are part of the reader, whereas declaration handlers need to use the getProperty and putProperty to work with the reader. Listing 18.7 illustrates how to set the declaration handler of the SAXXMLReader40 using the putProperty method.

LISTING 18.7 Using putProperty for XMLReader Configuration

Dim sax As SAXXMLReader40
Set sax = New SAXXMLReader40

' Assume we have a declHandler that implements
' IVBSAXDeclHandler 
sax.putProperty _
 "http://xml.org/sax/properties/declaration-handler", _
 declHandler

Whenever you use the SAXXMLReader40 to validate against a specific XML schema, you must use the configuration methods discussed here.

18.4.3.4 -SAX2 Parsing Errors

To handle parsing errors in a SAX application, the application must implement a handler for errors that are fired by the SAX parser. As stated earlier, components that are MSXML SAX error handlers implement the IVBSAXErrorHandler interface. If your application needs to know when the validation of an XML document fails, a component must implement IVBSAXErrorHandler and include code in the fatalerror handler to respond to the failed validation. The XML Schema Tree example at the end of this chapter uses SAX2 and the error handler interface to validate an XML document against an XML schema.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020