XML in Office: Introductory Discussion
Meet the family! Word, Excel, InfoPath, Access, and FrontPage
Information capture and reuse
End-user data connection
Data-driven application enhancement
This chapter is an overview of the XML features of Office. We discuss the XML-enabled Office products Word, Excel, Access, FrontPage and the newly introduced InfoPath in the context of several information sharing scenarios.
But the products are really just the supporting cast. The true stars are the advances that XML in Office brings to:
information capture and reuse;
end-user data connection; and
data-driven application enhancement.
3.1 Information capture and reuse
For all the valuable abstract data that is managed in database systems, there is even more that is hidden in rendered word processing documents. That fact represents an enormous intellectual property loss for enterprises, of course, but it also represents a nuisance and a time-waster for the information workers who work with those documents.
Consider the articles written for a company's websites and newsletters. Every one is likely to contain a title, author, and date within it, but more often than not that information has to be retyped, or individually copied and pasted, to get it into a catalog entry. That's because there is no reliable way for a computer to recognize those data items in order to extract them.
3.1.1 Word processing
In contrast, look at Figure 3-1, which shows an article being edited in Microsoft Word.
Figure 3-1 Word document showing optional tag icons and task pane with XML structure
The article is actually an XML document that conforms to a schema of the user's choosing, in this case article. The user has opted to display icons that represent the start- and end-tags. Note that there are distinct elements for the title, author, and date.
Solution developers can use the XML elements to check and normalize information as it is entered, whether or not the tag icons are displayed. An application, for example, could notify the user if the text entered for a date element isn't really a valid date. Or it could automatically supply the current year if none was entered.
The right-hand pane is called the task pane; it can be used for various purposes. In the figure, the top of the task pane shows the XML structure of the document. At the bottom is a list of the types of element that are valid at the current point in the document, according to the article schema.
The document is also a normal Word document, so Word's formatting features can be used in the usual way.
There are three ways to save this document as XML:
WordML is Word's native XML file format. It preserves the Word document just as the DOC format would, including formatting and hyperlinks. However, it doesn't include any of the article markup, so we won't discuss this option further here. (We cover it in Chapter 5, "Rendering and presenting XML documents", on page 86.)
The document can be saved as an XML document conforming to a custom schema; in this case, article. A custom schema would normally be defined by an enterprise, or by a committee set up by an industry to which the enterprise belongs. For that reason, it would be designed to preserve the abstract data needed for the user's applications. For example, the title, author, and date can easily be identified by software and extracted for use in a catalog of articles.
The saved document could contain both WordML and the article markup, since the two are in different namespaces. This option preserves the formatting applied by the user, while still preserving the abstract data and distinguishing it from the rendition information.
In our example, the article is the entire Word document, but that isn't a requirement. It is possible to intersperse short XML documents within a larger Word document. For example, a travel guide might include multiple XML structures that describe hotels, with subelements for the name, address, number of rooms, rates, etc.
Using XML with Word documents enables companies to capture more of the intellectual property that is created informally by individuals and work groups, and that typically remains inaccessible to enterprise information systems. As XML, that property becomes a portable asset that can be reused as needed.1
For many purposes, a data entry form is more suitable for information capture than a typically larger and less constrained word processing document. InfoPath lets you design and use forms that are really XML documents that conform to your own custom schemas.2
Figure 3-2 shows the layout of an order form in InfoPath's design mode. The structure of the order schema is shown in the task pane on the right, from which element types can be dragged onto the form.
Figure 3-2 InfoPath design interface with data source in task pane
Note that there is only one item line in the form design. Because the order schema allows item elements to be repeated, a user entering data will be able to add item lines as needed. Had customer elements been repeatable, the form would expand to allow insertion of the group of customer information fields.
Unlike Word, InfoPath generates an XSLT stylesheet to control the rendering of the form. The formatting can even be based on the data entered in the form. For example, the dialog box in Figure 3-3 specifies that negative prices should be shown in a different color.
Figure 3-3 InfoPath conditional formatting dialog
InfoPath is described in detail in Chapter 9, "Designing and using forms", on page 180.
3.1.3 Relational data
XML elements, whether captured in Word or Excel or InfoPath (or any other way, for that matter), are as well-defined and predictable as the columns and tables of a database. XML documents of all kinds are therefore a source of information as rich as any other operational data store. Companies can aggregate, parse, search, manage, and reuse the data in documents in the same way they do the transactional data that is typically captured for relational databases.
They can also import the document data into a database and use it in conjunction with data from other sources. In addition, they can export DBMS data as XML documents.
Figure 3-4, for example, shows the options Access offers when exporting data as XML. You can specify which tables and records to export and how to sort and/or transform them.
Figure 3-4 Access dialog for exporting data as XML
Figure 3-5 shows the options for exporting a schema as XML. You can choose whether or not to export the schema, and whether it should be exported within the data document or as an independent schema document.3
Figure 3-5 Access dialog for exporting schema as XML