- Exchanging Documents with VB and XML
- Outsourcing Data Entry
- Parsing XML
- Dig a Little Deeper
Outsourcing Data Entry
Typical problems faced by data collection companies include finding and retaining competent data entry operators. These problems can lead companies to outsource their data entry to specialized vendors. It is ideal to use XML to exchange data with these vendors because the schema can precisely specify how the vendor is to return the data. In this particular example, the data collection company creates scanned images of paper documents and sends them to the data entry vendor for processing. The vendor then returns an XML document with data keyed from the scanned image. The architecture of this application can be seen in Figure 1.
When the World Wide Web Consortium (W3C) XML 1.0 standard was released, the schema (the definition of the document) had to be specified in a Document Type Definition (DTD). Although the DTD could be included within the document itself, it had several limitations, including the fact that it did not support data types and was not itself XML. This meant that applications required a separate parser for the DTD. Microsoft introduced an XML schema called XML-Data, and includes support for it, along with DTDs in all its product offerings. In the future, expect XML-Data to be replaced by a W3C-sanctioned XML schema called XML Schema Definitions (XSD).
Listing 1 shows the XML-Data schema for the document to which a data entry vendor must adhere, as specified by the data collection company. As you can see, the data collection company must collect demographic information from its potential customers, as well as responses to questions. Although a discussion of the XML-Data syntax is beyond the scope of this article, it should suffice to say that the XML-Data schema specifies each data element, its attributes, which ones are required, their data types, and how many instances can appear within the document. In this way, the schema can fully represent the agreement between the data collection company and the data entry vendor. After the schema is produced (by a tool such as XML Authority developed by extensibility—see http://www.extensibility.com and the business arrangements have been taken care of, we can begin receiving and processing the documents.
In this application, it was decided that the data entry vendor would create the XML documents based on the schema and then use FTP to transfer them to a secured directory at the data collection company. (Other delivery methods such as HTTP uploading and SOAP could have been used as well, and BizTalk Server would be used in the future in order to automate the delivery and receipt of documents.) As the XML documents arrive, they are processed by a Windows NT 4.0 service that is written in VB. The service parses the document and calls MTS components to persist the data in a SQL database, as shown in Figure 1.
Writing an NT service in VB is not a particularly difficult task. An ActiveX control, written by Microsoft, encapsulates the Win32 APIs that are necessary to communicate with the NT Service Control Manager (SCM) in order to perform functions such as registering the service and responding to the start, pause, and stop events. You can then add code using the FileSystemObject and a simple timer in order to check a specific FTP drop point that is configured in the registry. For a complete discussion of creating an NT service with VB, complete with an example of polling for files using the FileSystemObject, see Chapter 19 of my book Pure Visual Basic.
After the file has been picked up by the service, the real work begins.