Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

SQL Server I/O: Reading XML Documents

Last updated Mar 28, 2003.

In my last installment, I showed you how to create an XML document from SQL Server. Creating these documents isn't a difficult process; you just need to understand the syntax of the FOR XML predicate on a SELECT statement and how you want the document structured.

Using SQL Server to read a document, and potentially save data from it, is another story. You need to take a few steps to get the data in, process it, and navigate to find what you want.

This does make some sense. After all, SQL Server is a relational database management system, using set-theory concepts to join normalized groups of data together. XML is a markup standard designed primarily to represent hierarchical, de-normalized data. The ability to work with these two types of data at one time is no small task.

Here's one possible layout for a data structure in XML, in what's called an "element normal form" view:

<database>
 <table>
  <row>
  <column1>...</column1>
  <column2>...</column2>
  </row>
 </table>
</database>

The elements represent each distinct part of database objects. Another layout for representing data in XML is the "attribute normal form" view:

<database>
 <table>
  <row column1="Value" column2="Value"/>
 </table>
</database>

I make this distinction here because you need to keep the structure of the data in mind as you "shred" (or decompose) the data from a hierarchical form into a relational one. I mentioned in the last article that many people prefer the attribute-centric model because it requires one less navigation step – called "traversing a node."

Once you've got the structure defined, you can begin the process of importing data. There are two ways to import XML data. The first is to use a command called OPENXML, and the second is to use a program stream called XML Bulk Import. The second way is a bit involved for our needs here, so I'll cover it in another article.

There are four steps to use the OPENXML statement:

  • Open the XML Document
  • Prepare the data in memory
  • Query the data and map it
  • Close the document in memory

First, I need to get an XML document into memory. I do that by creating a variable to hold the document:

DECLARE @strXML varchar(4096)

I also set a variable to act as a "handle," or pointer. This allows SQL Server to get at the document's memory location.

DECLARE @idoc int

Now I need some XML data to work with. For this example, I'll just create some by hand, but normally I would read a document from a file location. I'll show you how to do that in a moment. Here's a simple XML snippet of a single book with three authors:

SET @strXML = 
'<Book>
<Author au_id="1234" au_fname="Buck" au_lname="Woody"/>
<Author au_id="1235" au_fname="Jon" au_lname="Glandon"/>
<Author au_id="1236" au_fname="Yvonne" au_lname="Madson"/>
</Book>'

This example is in attribute normal form. Next, I prepare the XML document for use with a special stored procedure:

EXEC sp_xml_preparedocument @idoc OUTPUT, @strXML

This requires a bit of explanation. The sp_xml_preparedocument "shreds" the XML hierarchy into a rowset-style data representation.

There are three arguments for this stored procedure. The first (@idoc) sets the memory handle location I mentioned a moment ago, the second (OUTPUT) directs the stored procedure to feed the shredded data to that memory location, and the third (@strXML) is the XML data.

Now I can use the XML document with the OPENXML command. It takes the following format:

SELECT au_fname, au_lname
FROM OPENXML(@idoc, '/Book/Author')
WITH (au_fname varchar(30)
  ,au_lname varchar(30)
  )

This is a standard SELECT statement, but the FROM predicate is parsed through the OPENXML function.

The OPENXML function shows two parameters in this example: the handle of the document in memory (@idoc) and the path to the node in which I'm interested. In this case, I navigated to the Book "root" node, then on to the node for Author. (I'll explain the path options in more detail in a future article.)

So far, I shredded the XML, asked for two attributes as fields, and told SQL Server where in the tree the data can be located. What's left is to map that data into data types that SQL Server understands. In the WITH predicate I define the two fields as varchar(30) types. I have to do that each time I get the data; But, if there is a table with the same field names that I've asked for with proper types in place, I can use that instead. Modifying the previous query to use the pubs database, I can use the authors table:

USE pubs
GO
DECLARE @strXML varchar(4096)
DECLARE @idoc int
SET @strXML = 
'<Book>
<Author au_id="1234" au_fname="Buck" au_lname="Woody"/>
<Author au_id="1235" au_fname="Jon" au_lname="Glandon"/>
<Author au_id="1236" au_fname="Yvonne" au_lname="Madson"/>
</Book>'
EXEC sp_xml_preparedocument @idoc OUTPUT, @strXML
SELECT au_fname, au_lname
FROM OPENXML(@idoc, '/Book/Author')
WITH 
  authors

Once I work with the data, I need to close the document.

EXEC sp_xml_removedocument @idoc

If I don't, the memory is left in use – and that's a problem. SQL Server restricts the amount of memory for the XML operations to one-eighth the total amount of memory for SQL Server. That means that you should use this command for passing data in small chunks.

Of course, you won't often create an XML document using Query Analyzer. You have two options to bring the document into the variable. You can send the data along with a Visual C# or other programming language, or you can use the xp_cmdshell stored procedure to type a document and then store the result in the variable. Here's an example of that:

CREATE TABLE #xmltable (string VARCHAR(4000))
INSERT #xmltable EXEC master..xp_cmdshell 'type c:\temp\test.xml'
SELECT string 
FROM #xmltable
DROP TABLE #xmltable

Here's the complete code, not including the file read:

DECLARE @strXML varchar(4096)
DECLARE @idoc int
SET @strXML = 
'<Book>
<Author au_id="1234" au_fname="Buck" au_lname="Woody"/>
<Author au_id="1235" au_fname="Jon" au_lname="Glandon"/>
<Author au_id="1236" au_fname="Yvonne" au_lname="Madson"/>
</Book>'
EXEC sp_xml_preparedocument @idoc OUTPUT, @strXML
SELECT au_fname, au_lname
FROM OPENXML(@idoc, '/Book/Author')
WITH (au_fname varchar(30)
  ,au_lname varchar(30)
  )
EXEC sp_xml_removedocument @idoc

There's a great deal more to this topic that we'll cover in the next few articles. I'll explain how to use structures to navigate the data more easily. I'll also show you how to read more than a small amount of XML data into SQL Server.

Online Resources

Michael Rys is the product Manager for SQL Server XML. SQL Summit has published an interview with him that makes for a good read. You can find it here: http://www.sqlsummit.com/People/MRys.htm

InformIT Tutorials and Sample Chapters

Nicholas Chase explains XML in databases in his article called "XML and Databases." You can read it here: http://www.informit.com/guides/content.aspx?g=xml&seqNum=110