Home > Articles > Web Services > XML

  • Print
  • + Share This
This chapter is from the book

LOB or Compose?

The most basic dichotomy in mapping between XML and relational data is the choice between representing an XML document in a single column or in multiple columns: LOB representation or composed representation. Additionally, relational databases that have been extended to add XML functionality may implement a native XML type.

In an LOB representation, a column in a database table contains a literal XML string, usually in a CLOB (character large object) or similar datatype—hence the name. A single application might store multiple types of XML documents in different columns or even in the same column. Listing 6.1 shows an example of LOB XML. In this example, we have a database table that stores purchase orders as XML documents in one column and includes two other columns: id and receivedate, that are part of the management of the purchase order document. (Note: in this and all other examples of XML used in this chapter, whitespace has been added for clarity; it is not necessarily part of the actual XML data.)

Listing 6.1 Example of LOB-Encoded XML

id        receivedate     purchaseorder
_____________________________________________________________________

4023      2001-12-01       <purchaseOrder 
                               xmlns="http://po.example.com">
                             <originator billId="0013579">
                               <contactName>
                                 Fred Allen
                               </contactName>
                               <contactAddress>
                                 <street>123 2nd Ave. NW</street>
                                 <city>Anytown</city>
                                 <state>OH</state>
                                 <zip-four>99999-1234</zip-four>
                               </contactAddress>
                               <phone>(330) 555-1212</phone>
                               <originatorReferenceNumber>
                                 AS 1132
                               </originatorReferenceNumber>
                             </originator>
                             <order>
                               <item code="34xdl 1278 12ct"
                                     quantity="1"/>
                               <item code="57xdl 7789"
                                     quantity="1"
                                     colorcode="012"/>
                             </order>
                             <shipAddress sameAsContact="true"/>
                             <shipCode carrier="02"/> 
                           </purchaseOrder>
5327      2002-04-23       <purchaseOrder
                               xmlns="http://po.example.com">
                             <originator 
                                   ...

The other major approach to mapping between XML and relational data employs a composed representation of XML data. In a composed representation, the XML document does not exist in its XML form in the relational database. Instead, an XML document is composed from individual data items that are stored in many columns, and often in multiple tables, in the relational database. Typically each XML element that has simple data corresponds to the value of some column in some row of the relational data.

For example, the same data that we saw in Listing 6.1 might look as shown in Listing 6.2 in a composed representation. A key observation for the composed representation is that the same relational data could be used to represent many different XML documents by choosing different columns, arranged in different ways, and different corresponding element names. There must be some explicit or implicit mechanism to determine what the XML for a set of relational tables is supposed to look like. We call such a mechanism a composition technique. A variety of composition techniques exist, and we explore a few of the variations in the next section.

Listing 6.2 Example of Composed XML

purchaseorder table
id        receivedate
_____________________________________________________________________

4023      2001-12-01
5327      2002-04-23


originator table
poid      billId     contactName     street             ...
_____________________________________________________________________

4023      0013579    Fred Allen      123 2nd Ave. NW


order table
poid      code             quant   color   size
_____________________________________________________________________

4023      34xdl 1278 12ct      1    null   null
4023      57xdl 7789           1     012   null


ship table
poid      carrier    ... 
_____________________________________________________________________

 4023      02    

An important aspect of the composed representation concerns the use case where the original data is in XML form. In this case, the XML document must be decomposed into values that can be placed into the relational tables—a process often called by the evocative name “shredding.” This shredded, or decomposed, representation is then recomposed on the fly to generate the final XML. The shredding, or decomposition, process can also be configurable; ideally it uses the same information that is used in the composition technique that will be used to recompose the data. Shredding creates a variety of additional issues and choices that we discuss in a separate section below.

LOB and composed approaches can coexist, of course: part of an XML document can be (de)composed, while certain subelements are maintained in LOB form. For example, a table that mixes composed and LOB data might look like the one shown in Listing 6.3. In this case, the document data is divided between decomposed, or shredded, elements and XML LOB contents. In other scenarios, the division might be redundant: Certain elements of the LOB information might be duplicated as individual database columns.

Listing 6.3 A Table with Both Composed and LOB Data

originator table
poid      billId     contactName     address
_____________________________________________________________________

4023      0013579    Fred Allen      <contactAddress>
                                       <street>1234 2nd Ave. NW</street>
                                       <city>Anytown</city>
                                       <state>OH</state>
                                       <zip-four>99999-1234</zip-four>
                                     </contactAddress>

Finally, when the database itself is extended to include native XML functionality, then another representation option is a native XML type. That is, the SQL language is extended to allow “XML” to be a primitive datatype. If we look behind the implementation of such a type, we will generally find a dichotomy similar to the one we have outlined here: either an LOB data object or some structured representation is used. But of course the SQL database may also have additional options, especially with regard to details like indexing. Options for an XML native datatype are discussed in more detail in Chapter 7.

Different vendors use different approaches to support mapping between XML and relational data: Some use an LOB representation, some use a composed representation, and some allow either, or a mixture of both. (The same holds true when the database supports a native XML datatype: The database designer may have a choice of underlying representations.)

With respect to the kinds of applications supported by each representation, we can make the following observations:

  • Clearly if the source data is relational, a composed representation is used.

  • LOB representations are the most flexible with respect to wide varieties of XML inputs, including marked-up text and variable- or no-schema documents. Composed representations with shredding can manage structured or semi-structured XML documents with a given schema much more easily than they can the other types of XML sources.

  • LOB representations obviously make it easy to implement the “emit documents” application type. All the other application types, however, require some ability to look inside the document, which must be implemented as an additional layer of processing that cannot take advantage of the database's capabilities, unless implemented as part of a native implementation. This is a significant restriction. By comparison, the composed technique exposes the XML data in such a way that it is possible to use the SQL engine's own capabilities to implement important operations such as selection, at the cost of requiring additional complexity to generate XML output.

  • Hybrid and redundant representations can help obtain the benefits of both approaches by enabling indexing and easy access to some elements, while preserving complex XML representations. The cost is a more complex representation of the data, essentially requiring that any access or update methods be able to handle both cases. Redundant representations also add the overhead of keeping multiple representations of the same data in sync.

  • + Share This
  • 🔖 Save To Your Account