Composition Techniques: Common Concepts
Every composed representation requires some ability to express how an XML document is composed from the rows and columns of an SQL database. Over the last several years, many different approaches have been tried, some as research projects, some commercially. These techniques may look very different, but they have a number of conceptual similarities, which we will examine. In the next section we look at some specific techniques that use these concepts. As usual, we do not attempt to list the different composition techniques currently in use by vendors, concentrating instead on the underlying principles and variations.
Certain concepts are common to most composition techniques. First of all, a composition technique is either implicit or explicit. With an implicit technique, the user or application has no control over how the relational data is translated into XML. An explicit technique allows control (either on the part of the application accessing the XML or, in the case of shredding, during the design of the database representation).
If explicit control is allowed, it may include the following capabilities:
Select which tables and columns to use in the XML document. Some techniques allow you to specify a (SQL) query to use as a starting point, which also allows for joins, selections, data transformations, and aggregations to occur over the source data before using it to generate the XML. As we will see, however, this has serious implications for update.
Map rows from tables (or queries) onto XML elements. That is, we can create an XML element for every row in a table (or query result).
Map columns from tables (or queries) onto XML elements or attributes. The column value is generally the element or attribute value.
Choose names for the XML elements or attributes.
Allow control over datatype coercions and formatting.
Add fixed XML elements, attributes and/or values. This allows the creation of certain kinds of additional structure in the XML. As a simple example, most composition techniques would allow the addition of the fixed address element in the following XML (where the database might have contained columns for name, street, etc.):
<name>Francis Smith</name> <address> <street>123 Main St.</street> <city>Anytown</city> <state>OH</state> <zip>88888</zip> </address>
Create additional XML structure by allowing the specification of additional XML-producing operations on the relational data. This item is the most significant. Since the output of SQL queries is relational, we need some non-SQL way to generate more complex hierarchical data. The main way to do that is to model some relationships in the XML by special XML-generating operations on the relational data. We examine two such operations in detail in the following sections: hierarchical joins and grouping. These operations are very closely related to their normal SQL counterparts, but we must keep in mind that they are different: relational join with and group by generate relational results, while the operations we introduce generate hierarchical XML results.
An additional level of complexity and power is introduced if explicit control is allowed to compute whether or not an element or attribute will be present in the XML document via an arbitrary expression (in contrast with the usual case: An element is present if its content value is non-null) or to compute an element or attribute's name via an expression. Most composition techniques do not support this level of control. Sometimes, however, we need these capabilities in order to generate an XML document according to a certain standard. Usually in this case some form of composition is combined with XQuery or another transformation language to achieve the final XML (we demonstrate this kind of combination when we describe default mappings in the next section).
Here's a scorecard of how these different capabilities match up with different application types: As a general rule, all of them support the application types of selecting from, querying, and transforming XML documents. Update applications are compatible with items 1–5, so long as the values for the XML come from a single SQL table; as soon as joins or many other SQL capabilities are involved, full update capability becomes impossible (though limited update capability may still exist). Items 6 and 7 are also not compatible with full update capability. These are specific examples of a general problem known in the relational database world as “updating through views.” The problem stems from the fact that the operations of a powerful query language are not, in general, uniquely reversible. Therefore, if one attempts to update the result of a (nonreversible) query, there is usually no unique way to determine how to update the original data in a corresponding way. A simple example is attempting to update a generated field that has been created by concatenating two source fields: If you change the value of the concatenation, how should you “break” that value across the sources? To preserve update capability, therefore, we must restrict the complexity of the query or transformation operations to those that are reversible.
Generation of XML Structure through Hierarchical Joins
To illustrate generation of XML with a join schematically, suppose that we have two tables, Table A and Table B. If we generate XML structure through a join from A to B, the XML is going to look something like that shown in Listing 6.4. The general idea is that within an element corresponding to a row of Table A, there are subelements that correspond to “matching” rows in Table B—that is, the rows of Table B that join with a particular row of Table A. The relationship is actually most similar to a relational left outer join, because the mapping usually contains all rows of Table A, even those which have no corresponding rows in Table B.
Listing 6.4 Generation of XML Structure from a Join between Two Tables
<ElementforRowofA> <SubElementFromColumnofA> ... <SubElementforMatchingRowofB> <SubElementFromColumnofB> ... </SubElementforMatchingRowofB> <SubElementforMatchingRowofB> <SubElementFromColumnofB> ... </SubElementforMatchingRowofB> ... </ElementforRowofA>
The structure shown in Listing 6.4 may vary, of course. In general, the elements corresponding to data from Table A may be intermixed with those corresponding to data from Table B. Some element may contain all the data from Table B collectively, or the elements shown that contain all the data from a given row of B may be missing.
Most important, there is no limitation on the number of tables that may be joined in this fashion to produce a single XML document. Listing 6.5 shows an example that uses four tables. Note in particular that Table A joins to B, which in turn joins to C, while separately Table A joins to D. We call this arrangement a nonlinear hierarchy, because A has more than a single child that has been added through different hierarchical joins. Note that this relationship could not be modeled with standard relational joins (though it could be modeled in a nested-relational representation). Creating XML structure through hierarchical joins is the most widely supported and most commonly used idiom in composition. It is particularly useful with normalized relational data or star schemas.
Listing 6.5 Complex Hierarchy Generated through Multiple Joins
<ElementforA> <AContents>... <ElementsforB> <BContents>... <ElementsforC> <CContents>... </ElementsforC> </ElementsforB> <ElementsforD> <DContents>... </ElementsforD> </ElementforA>
Generation of XML Structure through Hierarchical Grouping
To illustrate generation of XML with hierarchical grouping, suppose that we generate XML structure through grouping on a single Table A. The resulting XML will look similar to that in Listing 6.6. In this scenario, we create an XML element for every unique value of a set of grouping columns of A, and then within that element occur subelements corresponding to data for nongrouped columns of the individual rows within the group. Generating structure through hierarchical grouping is particularly useful if we want to recreate a “normalized” XML structure from denormalized relational data.
Listing 6.6 Generation of XML Structure from Grouping on a Single Table
<ElementforGroupKeyofA> <SubElementFromGroupingColumnofA> ... <SubElementforSingleRowofA> <SubElementFromNonGroupedColumnofA> ... </SubElementforSingleRowofA> <SubElementforSingleRowofA> <SubElementFromNonGroupedColumnofA> ... </SubElementforSingleRowofA> ... </ElementforGroupKeyofA>
Most composition techniques do not provide direct support for hierarchical grouping. When such support is lacking, the same effect can be generated by joining Table A to itself, where the “outer table” is a query that generates the unique group keys from A, and the inner table retrieves the matching rows. But the self-join technique can be much more expensive than a direct implementation of grouping.