Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

SQL Server I/O: Creating Hierarchies

Last updated Mar 28, 2003.

The concepts of relational database theory and hierarchical document formats seem miles apart. But in reality, it's all just about looking at data in multiple ways.

Last week, I showed you how to use the FOR XML EXPLICIT predicate to create more powerful extracts from a database than using FOR XML AUTO. This week, I explain how to break relational data into hierarchies. Hierarchies are natural outputs of XML.

I've chosen a simple example. I want a list of authors and the books each author has written. I'll publish this list out to an advertiser. The advertiser wants to see the authors as elements, and the books they've written as child element for each author.

I have all this information in the pubs database. First, I use a relational T-SQL query that shows me the information I'm after:

SELECT 
  a.au_fname
  , a.au_lname
  , c.type
  , c.title
FROM authors a
INNER JOIN titleauthor b
  ON a.au_id = b.au_id
INNER JOIN titles c
  ON b.title_id = c.title_id
ORDER BY a.au_lname, a.au_fname, c.type

Which produces this partial output:

au_fname    au_lname         type   title                   
-------------------- ---------------------------------------- ------------ -------------------------------------------------------------------------------- 
Abraham    Bennet         business  The Busy Executive's Database Guide
Reginald    Blotchet-Halls       trad_cook Fifty Years in Buckingham Palace Kitchens
Cheryl    Carson         popular_comp But Is It User Friendly?
Michel    DeFrance         mod_cook  The Gourmet Microwave
Innes    del Castillo        mod_cook  Silicon Valley Gastronomic Treats
Ann     Dull          popular_comp Secrets of Silicon Valley

The relational aspects of this output are shown by the duplicate names that show up when an author writes more than one book. I can use various GROUP BY and GROUPING clauses to segment the results if I include aggregated numeric values, but a better approach is to use XML, which can lay out the data as a hierarchy.

To create the hierarchies, I use several tools. First, I use the FOR XML EXCPLICIT predicate to the SELECT statement. That predicate allows me to specify the elements and attributes by hand, rather than allowing SQL Server to do it for me.

I also use the T-SQL UNION operator. This operator takes two separate queries and joins the result. To do that, there has to be the same number of columns in both queries, so I need to pad some columns in each query with NULLs. This creates the layout.

The XML query also needs to know how to create the hierarchy, so an ORDER BY is also important. Without it, the entire hierarchy shows up at the end of the last element from the first query.

The query is a bit complex, with a lot of steps, but not really complicated. Like T-SQL itself, to understand it, all you have to do is break down the component parts of the statement.

Here's the query in its entirety. I'll explain the code in a moment:

SELECT 1 AS Tag 
  , NULL AS Parent
  , authors.au_id AS [Author!1!AuthorID]
  , au_fname + ' ' + au_lname AS [Author!1!AuthorName]
  , NULL AS [Title!2!Type]
  , NULL AS [Title!2!TitleName]
FROM authors

UNION ALL

SELECT 2 AS Tag 
  , 1 AS Parent
  , authors.au_id
  , NULL
  , titles.type
  , titles.title
FROM authors 
INNER JOIN titleauthor 
  ON authors.au_id = titleauthor.au_id
INNER JOIN titles 
  ON titles.title_id = titleauthor.title_id
 
ORDER BY au_id

FOR XML EXPLICIT

Here's a partial result:

<Author AuthorID="172-32-1176" AuthorName="Johnson White">
  <Title Type="psychology " TitleName="Prolonged Data Deprivation: Four Case Studies"/>
  <Title Type="business " TitleName="The Busy Executive's Database Guide"/>
  <Title Type="business " TitleName="You Can Combat Computer Stress!"/>
</Author>

<Author AuthorID="213-46-8915" AuthorName="Marjorie Green"/>
<Author AuthorID="238-95-7766" AuthorName="Cheryl Carson">
  <Title Type="popular_comp" TitleName="But Is It User Friendly?"/>
  <Title Type="trad_cook " TitleName="Sushi, Anyone?"/>
</Author>

<Author AuthorID="267-41-2394" AuthorName="Michael O&apos;Leary">
  <Title Type="business " TitleName="Cooking with Computers: Surreptitious Balance Sheets"/>
</Author>

<Author AuthorID="274-80-9391" AuthorName="Dean Straight">
  <Title Type="business " TitleName="Straight Talk About Computers"/>
</Author>

<Author AuthorID="341-22-1782" AuthorName="Meander Smith">
  <Title Type="business " TitleName="The Busy Executive&apos;s Database Guide"/>
</Author>

Let's dissect this query a bit to see how it works. The first line of the query provides a "tag" to refer to its elements:

SELECT 1 AS Tag 

Since it is the root of the data, I select a NULL as the parent in the second line:

  , NULL AS Parent

Then I provide the author's id; later, I use it to connect the two queries:

  , authors.au_id AS [Author!1!AuthorID]

I put the first and last names together, and call this element Author, with an attribute of AuthorName. I provide the 1 tag to show the level of hierarchy where the data resides:

  , au_fname + ' ' + au_lname AS [Author!1!AuthorName]

Because the UNION operator requires the same number of columns in the SELECT statements that it combines, I need to pad out the next two columns (or elements). I put both pieces of data under a single element (Title) and give them an attribute name of Type and TitleName. I also reference them to come from the second level of the hierarchy:

  , NULL AS [Title!2!Type]
  , NULL AS [Title!2!TitleName]

And then close out the first SELECT:

FROM authors

Now, I UNION the second query:

UNION ALL

The second SELECT is given a tag of 2, and then refers to its parent (1):

SELECT 2 AS Tag 
  , 1 AS Parent

I'll get that ID again, this time to use as a JOIN condition for the tables I need to get the other elements:

  , authors.au_id

And I place the NULL to pad out the UNION:

  , NULL

The next data points don't follow the same format as the first SELECT. That's because I've already specified the element and attribute tags in the first part of the query. All that's left is to join the relational tables and to specify that all-important ORDER BY:

  , titles.type
  , titles.title
FROM authors 
INNER JOIN titleauthor 
  ON authors.au_id = titleauthor.au_id
INNER JOIN titles 
  ON titles.title_id = titleauthor.title_id
 
ORDER BY au_id

FOR XML EXPLICIT

Try running this query on your system in the pubs database. The output will be in a single string, but for readability, you can add returns and tabs as I have here.

Try leaving off the ORDER BY and see how the results lay out. You'll see that all the authors are listed, and then the titles pile on to the last one.

This document ends up in attribute normal form. But by adding the "element" directive in the first part of the query (as I explained in my last article), you can get it into element normal form.

Next week, I'll explain how to work with Internet Information Server and SQL Server XML output.

Online Resources

The Data Direct Web site has a good article on XQuery, which coincidentally explains hierarchies in XML. You can find that info here.

InformIT Tutorials and Sample Chapters

Nicholas Chase also covers XQuery here.