Home > Articles > Web Services > XML

Converting XML to Relational Data for Use in DB2

  • Print
  • + Share This
This chapter describes methods to convert XML documents to rows in relational tables, commonly known as shredding or decomposing of XML documents.
This chapter is from the book

This chapter describes methods to convert XML documents to rows in relational tables. This conversion is commonly known as shredding or decomposing of XML documents. Given the rich support for XML columns in DB2 you might wonder in which cases it can still be useful or necessary to convert XML data to relational format. One common reason for shredding is that existing SQL applications might still require access to the data in relational format. For example, legacy applications, packaged business applications, or reporting software do not always understand XML and have fixed relational interfaces. Therefore you might sometimes find it useful to shred all or some of the data values of an incoming XML document into rows and columns of relational tables.

In this chapter you learn:

  • The advantages and disadvantages of shredding and of different shredding methods (section 11.1)
  • How to shred XML data to relational tables using INSERT statements that contain the XMLTABLE function (section 11.2)
  • How to use XML Schema annotations that map and shred XML documents to relational tables (section 11.3)

11.1 Advantages and Disadvantages of Shredding

The concept of XML shredding is illustrated in Figure 11.1. In this example, XML documents with customer name, address, and phone information are mapped to two relational tables. The documents can contain multiple phone elements because there is a one-to-many relationship between customers and phones. Hence, phone numbers are shredded into a separate table. Each repeating element, such as phone, leads to an additional table in the relational target schema. Suppose the customer information can also contain multiple email addresses, multiple accounts, a list of most recent orders, multiple products per order, and other repeating items. The number of tables required in the relational target schema can increase very quickly. Shredding XML into a large number of tables can lead to a complex and unnatural fragmentation of your logical business objects that makes application development difficult and error-prone. Querying the shredded data or reassembling the original documents may require complex multiway joins.

Figure 11.1

Figure 11.1 Shredding of an XML document

Depending on the complexity, variability, and purpose of your XML documents, shredding may or may not be a good option. Table 11.1 summarizes the pros and cons of shredding XML data to relational tables.

Table 11.1. When Shredding Is and Isn't a Good Option

Shredding Can Be Useful When...

Shredding Is Not A Good Option When...

  • Incoming XML data is just feeding an existing relational database.
  • Your XML data is complex and nested, and difficult to map to a relational schema.
  • The XML documents do not represent logical business objects that should be preserved.
  • Mapping your XML format to a relational schema leads to a large number of tables.
  • Your primary goal is to enable existing relational applications to access XML data.
  • Your XML Schema is highly variable or tends to change over time.
  • You are happy with your relational schema and would like to use it as much as possible.
  • Your primary goal is to manage XML documents as intact business objects.
  • The structure of your XML data is such that it can easily be mapped to relational tables.
  • You frequently need to reconstruct the shredded documents or parts of them.
  • Your XML format is relatively stable and changes to it are rare.
  • Ingesting XML data into the database at a high rate is important for your application.
  • You rarely need to reconstruct the shredded documents.
  • Querying or updating the data with SQL is more important than insert performance.

In many XML application scenarios the structure and usage of the XML data does not lend itself to easy and efficient shredding. This is the reason why DB2 supports XML columns that allow you to index and query XML data without conversion. Sometimes you will find that your application requirements can be best met with partial shredding or hybrid XML storage.

  • Partial shredding means that only a subset of the elements or attributes from each incoming XML document are shredded into relational tables. This is useful if a relational application does not require all data values from each XML document. In cases where shredding each document entirely is difficult and requires a complex relational target schema, partial shredding can simplify the mapping to the relational schema significantly.
  • Hybrid XML storage means that upon insert of an XML document into an XML column, selected element or attribute values are extracted and redundantly stored in relational columns.

If you choose to shred XML documents, entirely or partially, DB2 provides you with a rich set of capabilities to do some or all of the following:

  • Perform custom transformations of the data values before insertion into relational columns.
  • Shred the same element or attribute value into multiple columns of the same table or different tables.
  • Shred multiple different elements or attributes into the same column of a table.
  • Specify conditions that govern when certain elements are or are not shredded. For example, shred the address of a customer document only if the country is Canada.
  • Validate XML documents with an XML Schema during shredding.
  • Store the full XML document along with the shredded data.

DB2 9 for z/OS and DB2 9.x for Linux, UNIX, and Windows support two shredding methods:

  • SQL INSERT statements that use the XMLTABLE function. This function navigates into an input document and produces one or multiple relational rows for insert into a relational table.
  • Decomposition with an annotated XML Schema. Since an XML Schema defines the structure of XML documents, annotations can be added to the schema to define how elements and attributes are mapped to relational tables.

Table 11.2 and Table 11.3 discuss the advantages and disadvantages of the XMLTABLE method and the annotated schema method.

Table 11.2. Considerations for the XMLTABLE Method

Advantages of the XMLTABLE Method

Disadvantages of the XMLTABLE Method

  • It allows you to shred data even if you do not have an XML Schema.
  • For each target table that you want to shred into you need one INSERT statement.
  • It does not require you to understand the XML Schema language or to understand schema annotations for decomposition.
  • You might have to combine multiple INSERT statements in a stored procedure.
  • It is generally easier to use than annotated schemas because it is based on SQL and XPath.
  • You can use familiar XPath, XQuery, or SQL functions and expressions to extract and optionally modify the data values.
  • It often requires no or little work during XML Schema evolution.
  • The shredding process can consume data from multiple XML and relational sources, if needed, such as values from DB2 sequences or look-up data from other relational tables.
  • It can often provide better performance than annotated schema decompositions.
  • There is no GUI support for implementing the INSERT statements and the required XMLTABLE functions. You need to be familiar with XPath and SQL/XML.

Table 11.3. Considerations for Annotated Schema Decomposition

Advantages of the Annotated Schema Method

Disadvantages of the Annotated Schema Method

  • The mapping from XML to relational tables can be defined using a GUI in IBM Data Studio Developer.
  • It does not allow shredding without an XML Schema.
  • If you shred complex XML data into a large number of tables, the coding effort can be lower than with the XMLTABLE approach.
  • You might have to manually copy annotations when you start using a new version of your XML Schema.
  • It offers a bulk mode with detailed diagnostics if some documents fail to shred.
  • Despite the GUI support, you need to be familiar with the XML Schema language for all but simple shredding scenarios.
  • Annotating an XML Schema can be complex, if the schema itself is complex.
  • + Share This
  • 🔖 Save To Your Account