Home > Articles > Data > SQL Server

  • Print
  • + Share This
This chapter is from the book

T-SQL Language Enhancements

Even though this book is much about the CLR and outside access to SQL Server, let's not forget that Microsoft has enhanced the T-SQL language a lot in SQL Server 2005. In this section, we will look at some of the improvements.

TOP

TOP was introduced in SQL Server 7. Until SQL Server 2005, the TOP clause allowed the user to specify the number or percent of rows to be returned in a SELECT statement. In SQL Server 2005, the TOP clause can be used also for INSERT, UPDATE, and DELETE (in addition to SELECT), and the syntax is as follows: TOP (expression) [PERCENT]. Notice the parentheses around the expression; this is required when TOP is used for UPDATE, INSERT, and DELETE.

The following code shows some examples of using TOP.

—create a table and insert some data
CREATE TABLE toptest (col1 VARCHAR(150))
INSERT INTO toptest VALUES('Niels1')
INSERT INTO toptest VALUES('Niels2')
INSERT INTO toptest VALUES('Niels3')
INSERT INTO toptest VALUES('Niels4')
INSERT INTO toptest VALUES('Niels5')

—this returns 'Niels1' and 'Niels2'
SELECT TOP(2) * FROM toptest

—this sets 'Niels1' and 'Niels2' to 'hi'
UPDATE TOP(2) toptest SET col1 = 'hi'
SELECT * FROM toptest

—the two rows with 'hi' are deleted
DELETE TOP(2) toptest
SELECT * FROM toptest

—create a new table and insert some data
CREATE TABLE toptest2 (col1 VARCHAR(150))
INSERT INTO toptest2 VALUES('Niels1')
INSERT INTO toptest2 VALUES('Niels2')
INSERT INTO toptest2 VALUES('Niels3')
INSERT INTO toptest2 VALUES('Niels4')
INSERT INTO toptest2 VALUES('Niels5')

—'Niels1' and 'Niels2' are inserted
INSERT top(2) toptest
SELECT * FROM toptest2

SELECT * FROM toptest

An additional difference between the TOP clause in previous versions of SQL Server and in SQL Server 2005 is that we now can use expressions for number definition. The following code shows a couple of examples of that (it uses the tables from the preceding example).

—declare 3 variables
DECLARE @a INT
DECLARE @b INT
DECLARE @c INT

—set values
SET @a = 10
SET @b = 5
SELECT @c = @a/@b

—use the calculated expression
SELECT TOP(@c)* FROM toptest

—insert some more data in toptest
INSERT INTO toptest VALUES('Niels6')
INSERT INTO toptest VALUES('Niels7')
INSERT INTO toptest VALUES('Niels8')


—use a SELECT statement as expression
—this should return 5 rows
SELECT TOP(SELECT COUNT(*) FROM toptest2) * 
FROM toptest

The next T-SQL enhancement we'll look at is something completely new in SQL Server: the OUTPUT clause.

OUTPUT

The execution of a DML statement such as INSERT, UPDATE, or DELETE does not produce any results that indicate what was changed. Prior to SQL Server 2005, an extra round trip to the database was required to determine the changes. In SQL Server 2005 the INSERT, UPDATE, and DELETE statements have been enhanced to support an OUTPUT clause so that a single round trip is all that is required to modify the database and determine what changed. You use the OUTPUT clause together with the inserted and deleted virtual tables, much as in a trigger. The OUTPUT clause must be used with an INTO expression to fill a table. Typically, this will be a table variable. The following example creates a table, inserts some data, and finally deletes some records.

—create table and insert data
CREATE TABLE outputtbl 
(id INT IDENTITY, col1 VARCHAR(15))
go

INSERT INTO outputtbl VALUES('row1')
INSERT INTO outputtbl VALUES ('row2')
INSERT INTO outputtbl VALUES ('row5')
INSERT INTO outputtbl VALUES ('row6')
INSERT INTO outputtbl VALUES ('row7')
INSERT INTO outputtbl VALUES ('row8')
INSERT INTO outputtbl VALUES ('row9')
INSERT INTO outputtbl VALUES ('row10')

— make a table variable to hold the results of the OUTPUT clause
DECLARE @del AS TABLE (deletedId INT, deletedValue VARCHAR(15))
—delete two rows and return through
—the output clause
DELETE outputtbl
OUTPUT DELETED.id, DELETED.col1 INTO @del
WHERE id < 3
SELECT * FROM @del
GO
deletedId  deletedValue
—————- ———————-
1      row1
2      row2

(2 row(s) affected)

The previous example inserted the id and col1 values of the rows that were deleted into the table variable @del.

When used with an UPDATE command, OUTPUT produces both a DELETED and an INSERTED table. The DELETED table contains the values before the UPDATE command, and the DELETED table has the values after the UPDATE command. An example follows that shows OUTPUT being used to capture the result of an UPDATE.

—update records, this populates
—both the inserted and deleted tables
DECLARE @changes TABLE 
(id INT, oldValue VARCHAR(15), newValue VARCHAR(15))
UPDATE outputtbl
SET col1 = 'updated'
OUTPUT inserted.id, deleted.col1, inserted.col1
INTO @changes
WHERE id < 5
SELECT * FROM @changes
GO
id     oldValue    newValue
—————- ———————- ———————-
3      row5     updated
4      row6     updated

(2 row(s) affected)

Common Table Expressions and Recursive Queries

A Common Table Expression, or CTE, is an expression that produces a table that is referred to by name within the context of a single query. The general syntax for a CTE follows.

[WITH <common_table_expression> [,...n] ]
<common_table_expression>::=
  expression_name
  [(column_name [,...n])]
  AS
  (<CTE_query_expression>)

The following SQL batch shows a trivial usage of a CTE just to give you a feeling for its syntax.

WITH MathConst(PI, Avogadro)
AS
(SELECT 3.14159, 6.022e23)
SELECT * FROM MathConst
GO
PI               Avogadro
——————————————— ———————————
3.14159            6.022E+23
(1 row(s) affected)

The WITH clause, in effect, defines a table and its columns. This example says that a table named MathConst has two columns named PI and Avogadro. This is followed by a SELECT statement enclosed in parentheses after an AS keyword. And finally, all this is followed by a SELECT statement that references the MathConst table. Note that the syntax of the WITH clause is very similar to that of a VIEW. One way to think of a CTE is as a VIEW that lasts only for the life of the query expression at the end of the CTE. In the example, MathConst acts like a VIEW that is referenced in the query expression at the end of the CTE.

It is possible to define multiple tables in a CTE. A SQL batch follows that shows another trivial usage of a CTE that defines two tables, again shown just to make the syntax clear.

WITH MathConst(PI, Avogadro)
AS
(SELECT 3.14159, 6.022e23),
— second table
Package(Length, Width)
AS (SELECT 2, 5)
SELECT * FROM MathConst, Package
PI               Avogadro        Length   Width
——————————————— ——————————— —————- ——
3.14159            6.022E+23       2      5

(1 row(s) affected)

In this example, the CTE produced two tables, and the query expression merely joined them.

Both of the previous examples could have been done without using CTEs and, in fact, would have been easier to do without them. So what good are they?

In once sense, a CTE is just an alternate syntax for creating a VIEW that exists for one SQL expression, or it can be thought of as a more convenient way to use a derived table—that is, a subquery. However, CTEs are part of the SQL-92 standard, so adding them to SQL Server increases its standards compliance. In addition, CTEs are implemented in other databases, so ports from those databases may be easier with the addition of CTEs.

In some cases, CTEs can save a significant amount of typing and may provide extra information that can be used when the query plan is optimized. Let's look at an example where this is the case.

For this example, we will use three tables from the AdventureWorks database, a sample database that is distributed with SQL Server. We will use the SalesPerson, SalesHeader, and SalesDetail tables. The Sales Person table lists each salesperson that works for AdventureWorks. For each sale made at AdventureWorks, a SalesHeader is entered along with a SalesDetail for each item that that was sold in that sale. Each Sales Header lists the ID of the salesperson who made the sale. Each Sales Detail entry lists a part number, its unit price, and the quantity of the part sold.

The stock room has just called the Big Boss and told him that they are out of part number 90. The Big Boss calls you and wants you to make a report that lists the ID of each salesperson. Along with the ID, the Big Boss wants the text "MakeCall" listed if a salesperson made a sale that depends on part number 90 to be complete. Otherwise, he wants the text "Relax" printed. Just to ensure that the report lights a fire under the salespeople, the Big Boss also wants each line to list the value of the sale and the salesperson's sales quota.

Before we actually make use of the CTE, let's first write a query that finds all the IDs of salespeople who have sales that depend on part number 90.

SELECT DISTINCT SH.SalesPersonId FROM SalesOrderHeader SH JOIN
SalesOrderDetail SD ON SH.SalesOrderId = SD.SalesOrderId
AND SD.ProductID = 90
SalesPersonId
GO
SalesPersonId
——————-
14
21
22
more rows
(14 row(s) affected)

But the Big Boss has asked for a report with lines that look like this.

Action  SalesPersonID SalesQuota           Value
———— ——————- ——————————————— —————
MakeCall 22      250000.0000          2332.7784
... more lines
Relax  35      250000.0000          0

Each line number has the ID of a salesperson. If that salesperson has an order that depends on part number 90, the first column says "MakeCall" and the last column has the value involved in the order. Otherwise, the first column says "Relax" and the last column has 0 in it.

Figure 5Figure 7-5: A Chart of Accounts

Without CTEs, we could use a subquery to find the salespeople with orders that depend on the missing part to make the report the Big Boss wants, as in the SQL batch that follows.

SELECT 'MakeCall' AS Action, S.SalesPersonID, S.SalesQuota,
(SELECT SUM(SD.UnitPrice * SD.OrderQty) FROM SalesOrderHeader SH
JOIN SalesOrderDetail SD ON
SH.SalesOrderId = SD.SalesOrderId
AND SD.ProductID=90 AND SH.SalesPersonID=S.SalesPersonID
)
FROM SalesPerson S
WHERE EXISTS
(
SELECT * FROM SalesOrderHeader SH JOIN SalesOrderDetail SD ON
SH.SalesOrderID = SD.SalesOrderID AND SD.ProductID = 90
AND SH.SalesPersonID = S.SalesPersonID
)
UNION
SELECT 'Relax' AS Action, S.SalesPersonID, S.SalesQuota, 0
FROM SalesPerson S
WHERE NOT EXISTS
(
SELECT * FROM SalesOrderHeader SH JOIN SalesOrderDetail SD ON
SH.SalesOrderID = SD.SalesOrderID AND SD.ProductID = 90
AND SH.SalesPersonID = S.SalesPersonID
)

Notice that the subquery is reused in a number of places—once in the calculation of the value of the sales involved in the missing part and then again, twice more, in finding the salespeople involved in sales with and without the missing part.

Now let's produce the same report using a CTE.

WITH Missing(SP, AMT)
AS(
SELECT SH.SalesPersonID, SUM(SD.UnitPrice * SD.OrderQty) FROM SalesOrderHeader SH
JOIN SalesOrderDetail SD ON SH.SalesOrderId = SD.SalesOrderId
AND SD.ProductID=90 GROUP BY SH.SalesPersonID
)
SELECT 'MakeCall' AS Action, S.SalesPersonID, S.SalesQuota, Missing.AMT
FROM Missing JOIN SalesPerson S ON Missing.SP = S.SalesPersonID
UNION
SELECT 'Relax' AS Action, S.SalesPersonID, S.SalesQuota, 0
FROM SalesPerson S WHERE S.SalesPersonID NOT IN (SELECT SP FROM  Missing)

The Missing CTE is a table that has a row for each salesperson who has an order that depends on the missing part, and the value of what is missing. Notice that the Missing table is used in one part of the query to find the value of the missing parts and in another to determine whether a sales person should "MakeCall" or "Relax".

Although your opinion may differ, the CTE syntax is a bit clear and more encapsulated; that is, there is only one place that defines what orders are missing part number 90. Also, in theory, the CTE is giving the optimizer a bit more information in that it is telling the optimizer it plans on using Missing more than once.

The CTE is also part of another feature of SQL Server 2005 that is also part of the SQL:1999 standard. It is called a recursive query. This is especially useful for a chart of accounts in an accounting system or a parts explosion in a bill of materials. Both of these involve tree-structured data. In general, a recursive query is useful anytime tree-structured data is involved. We will look at an example of a chart of accounts to see how recursive queries work.

Figure 7-5 shows a simple chart of accounts containing two kinds of accounts: detail accounts and rollup accounts. Detail accounts have an actual balance associated with them; when a posting is made to an accounting system, it is posted to detail accounts. In Figure 7-5, account 4001 is a detail account that has a balance of $12.

Rollup accounts are used to summarize the totals of other accounts, which may be detail accounts or other rollup accounts. Every account, except for the root account, has a parent. The total of a rollup account is the sum of the accounts that are its children. In Figure 7-5 account 3002 is a rollup account, and it represents the sum of its two children, accounts 4001 and 4002.

In practice, one of the ways to represent a chart of accounts is to have two tables: one for detail accounts and the other for rollup accounts. A detail account has an account number, a parent account number, and a balance for columns. A rollup account has an account number and a parent but no balance associated with it. The SQL batch that follows builds and populates these two tables for the accounts shown in Figure 7-5.

CREATE TABLE DetailAccount(id INT PRIMARY KEY,
parent INT, balance FLOAT)
CREATE TABLE RollupAccount(id INT PRIMARY KEY,
parent INT)
INSERT INTO DetailAccount VALUES (3001, 2001, 10)
INSERT INTO DetailAccount VALUES(4001, 3002, 12)
INSERT INTO DetailAccount VALUES(4002, 3002, 14)
INSERT INTO DetailAccount VALUES(3004, 2002, 17)
INSERT INTO DetailAccount VALUES(3005, 2002, 10)
INSERT INTO DetailAccount VALUES(3006, 2002, 25)
INSERT INTO DetailAccount VALUES(3007, 2003, 7)
INSERT INTO DetailAccount VALUES(3008, 2003, 9)

INSERT INTO RollupAccount VALUES(3002, 2001)
INSERT INTO RollupAccount VALUES(2001, 1000)
INSERT INTO RollupAccount VALUES(2002, 1000)
INSERT INTO RollupAccount VALUES(2003, 1000)
INSERT INTO RollupAccount VALUES(1000, 0)

Figure 6Figure 7-6: Recursive Query

Note that this example does not include any referential integrity constraints or other information to make it easier to follow.

A typical thing to do with a chart of accounts it to calculate the value of all the rollup accounts or, in some cases, the value of a particular rollup account. In Figure 7-5 (shown earlier) the value of the rollup accounts is shown in gray, next to the account itself. We would like to be able to write a SQL batch like the one that follows.

SELECT id, balance FROM Rollup — a handy view
id     balance
—————- ———————————
1000    104
2001    36
2002    52
2003    16
3001    10
3002    26
3004    17
3005    10
3006    25
3007    7
3008    9
4001    12
4002    14

(13 row(s) affected)

SELECT id, balance FROM Rollup WHERE id = 2001
id     balance
—————- ———————————
2001    36

(1 row(s) affected)

This query shows a view name, Rollup, that we can query to find the values of all the accounts in the chart of accounts or an individual account. Let's look at how we can do this.

To start with, we will make a recursive query that just lists all the account numbers, starting with the top rollup account, 1000. The query that follows does this.

WITH Rollup(id, parent)
AS
(
 — anchor
 SELECT id, parent FROM RollupAccount WHERE id = 1000
 UNION ALL
— recursive call
SELECT R1.id, R1.parent FROM 
( 
  SELECT id, parent FROM DetailAccount
  UNION ALL
  SELECT id, parent FROM RollupAccount
) R1
JOIN Rollup R2 ON R2.id = r1.parent
)
— selecting results
SELECT id, parent FROM Rollup
GO
id     parent
—————- —————-
1000    0
2001    1000
2002    1000
2003    1000
3007    2003
3008    2003
3004    2002
3005    2002
3006    2002
3001    2001
3002    2001
4001    3002
4002    3002

(13 row(s) affected)

The previous batch creates a CTE named Rollup. There are three parts to a CTE when it is used to do recursion. The anchor, which initializes the recursion, is first. It sets the initial values of Rollup. In this case, Rollup is initialized to a table that has a single row representing the rollup account with id = 1000. The anchor may not make reference to the CTE Rollup.

The recursive call follows a UNION ALL keyword. UNION ALL must be used in this case. It makes reference to the CTE Rollup. The recursive call will be executed repeatedly until it produces no results. Each time it is called, Rollup will be the results of the previous call. Figure 7-6 shows the results of the anchor and each recursive call.

First the anchor is run, and it produces a result set that includes only the account 1000. Next the recursive call is run and produces a resultset that consists of all the accounts that have as a parent account 1000. The recursive call runs repeatedly, each time joined with its own previous result to produce the children of the accounts selected in the previous recursion. Also note that the recursive call itself is a UNION ALL because the accounts are spread out between the DetailAccount table and the RollupAccount table.

After the body of the CTE, the SELECT statement just selects all the results in Rollup—that is, the UNION of all the results produced by calls in the CTE body.

Now that we can produce a list of all the accounts by walking through the hierarchy from top to bottom, we can use what we learned to calculate the value of each account.

To calculate the values of the accounts, we must work from the bottom up—that is from the detail accounts up to the rollup account 1000. This means that our anchor must select all the detail accounts, and the recursive calls must progressively walk up the hierarchy to account 1000. Note that there is no requirement that the anchor produce a single row; it is just a SELECT statement.

The query that follows produces the values of all the accounts, both detail and rollup.

WITH Rollup(id, parent, balance)
AS
(
— anchor
SELECT id, parent, balance FROM DetailAccount
UNION ALL
— recursive call
SELECT R1.id, R1.parent, R2.balance
FROM RollupAccount R1
JOIN Rollup R2 ON R1.id = R2.parent
)
SELECT id, SUM(balance) balance FROM Rollup GROUP BY id
GO
id     balance
—————- ———————————
1000    104
2001    36
2002    52
2003    16
3001    10
3002    26
3004    17
3005    10
3006    25
3007    7
3008    9
4001    12
4002    14

(13 row(s) affected)

This query starts by having the anchor select all the detail accounts. The recursive call selects all the accounts that are parents, along with any balance produced by the previous call. This results in a table in which accounts are listed more than once. In fact, the table has as many rows for an account as that account has descendant accounts that are detail accounts. For example, if you looked at the rows produced for account 2001, you would see the three rows shown in the following diagram.

id     balance
—————- ———————————
2001    14
2001    12
2001    10

The balances 14, 12, and 10 correspond to the balances in the detail accounts 3001, 4001, and 4002, which are all decedents of account 2001. The query that follows the body of the CTE then groups the rows that are produced by account ID and calculates the balance with the SUM function.

There are other ways to solve this problem without using CTEs. A batch that uses a stored procedure that calls itself or a cursor could produce the same result. However, the CTE is a query, and it can be used to define a view, something a stored procedure or a cursor-based batch cannot. The view definition that follows defines a view, which is the recursive query we used earlier, and then uses it to get the balance for a single account, account 2001.

CREATE VIEW Rollup
AS
WITH Rollup(id, parent, balance)
AS
(
SELECT id, parent, balance FROM DetailAccount
UNION ALL
SELECT R1.id, R1.parent, R2.balance
FROM RollupAccount R1
JOIN Rollup R2 ON R1.id = R2.parent
)
SELECT id, SUM(balance) balance FROM Rollup GROUP ID id
GO
— get the balance for account 2001
SELECT balance FROM rollup WHERE id = 2001
GO
balance
———————————
36

(1 row(s) affected)

One of the strengths of a recursive query is the fact that it is a query and can be used to define a view. In addition, a single query in SQL Server is always a transaction, which means that a recursive query based on a CTE is a transaction.

Recursive queries, like any recursive algorithm, can go on forever. By default, if a recursive query attempts to do more than 100 recursions, it will be aborted. You can control this with an OPTION(MAXRECURSION 10), for example, to limit recursion to a depth of 10. The example that follows shows its usage.

WITH Rollup(id, parent, balance)
AS
(
— body of CTE removed for clarity
)
SELECT id, SUM(balance) balance FROM Rollup GROUP BY id
OPTION (MAXRECURSION 10)
GO

APPLY Operators

T-SQL adds two specialized join operators: CROSS APPLY and OUTER APPLY. Both act like a JOIN operator in that they produce the Cartesian product of two tables except that no ON clause is allowed. The following SQL batch is an example of a CROSS APPLY between two tables.

CREATE TABLE T1
(
  ID int
)
CREATE TABLE T2
(
 ID it
)
GO
INSERT INTO T1 VALUES (1)
INSERT INTO T1 VALUES (2)
INSERT INTO T2 VALUES (3)
INSERT INTO T2 VALUES (4)
GO
SELECT COUNT(*) FROM T1 CROSS APPLY T2
————————-
4

The APPLY operators have little utility with just tables or views; a CROSS JOIN could have been substituted in the preceding example and gotten the same results. It is intended that the APPLY operators be used with a table-valued function on their right, with the parameters for the table-valued function coming from the table on the left. The following SQL batch shows an example of this.

Table 7-5: Individual Sales, Including Quarter of Sale

Year

Quarter

Amount

2001

Q1

100

2001

Q2

120

2001

Q2

70

2001

Q3

55

2001

Q3

110

2001

Q4

90

2002

Q1

200

2002

Q2

150

2002

Q2

40

2002

Q2

60

2002

Q3

120

2002

Q3

110

2002

Q4

180


CREATE TABLE Belt
(
 model VARCHAR(20),
 length FLOAT
)
GO
— fill table with some data
DECLARE @index INT
SET @index = 5
WHILE(@index > 0)
BEGIN
INSERT INTO BELT VALUES ('B' + CONVERT(VARCHAR, @index), 10 * @index)
SET @index = @index – 1
END
GO
— make a table-valued function
CREATE FUNCTION Stretch (@length FLOAT)
RETURN @T TABLE
(
 MinLength FLOAT,
 MaxLength FLOAT
)
AS BEGIN
IF (@length > 20)
INSERT @T VALUES (@length – 4, @length + 5)
RETURN
END
GO
SELECT B.* S.MinLength, S.MaxLength FROM Belt AS B
CROSS APPLY Stretch(B.Length) AS S
GO
————————————
B30, 26, 35
B40, 36, 45
B50, 46, 55

Table 7-6: Yearly Sales Broken Down by Quarter

Year

Q1

Q2

Q3

Q4

2001

100

190

165

90

2002

200

250

230

180


Figure 7Figure 7-7: Tables for Hardware Store

The rows in the Belt table are cross-applied to the Stretch function. This function produces a table with a single row in it if the @length parameter passed into it is greater than 20; otherwise, it produces a table with no rows in it. The CROSS APPLY operator produces output when each table involved in the CROSS APPLY has at least one row in it. It is similar to a CROSS JOIN in this respect.

OUTER APPLY is similar to OUTER JOIN in that it produces output for all rows involved in the OUTER APPLY. The following SQL batch shows the results of an OUTER APPLY involving the same Belt table and Stretch function as in the previous example.

SELECT B.* S.MinLength, S.MaxLength FROM Belt AS B
CROSS APPLY Stretch(B.Length) AS S
GO
————————————
B10, 6, 15
B20, 16, 25
B30, 26, 35
B40, 36, 45
B50, 46, 55

The preceding example could have been done using CROSS and OUTER JOIN. CROSS APPLY is required, however, when used in conjunction with XML data types in certain XML operations that will be discussed in Chapter 9.

PIVOT Command

SQL Server 2005 adds the PIVOT command to T-SQL, so named because it can create a new table by swapping the rows and columns of an existing table. PIVOT is part of the OLAP section of the SQL:1999 standard. There are two general uses for the PIVOT command. One it to create an analytical view of some data, and the other is to implement an open schema.

A typical analytical use of the PIVOT command is to covert temporal data into categorized data in order to make the data easier to analyze. Consider a table used to record each sale made as it occurs; each row represents a single sale and includes the quarter that indicates when it occurred. This sort of view makes sense for recording sales but is not easy to use if you want to compare sales made in the same quarter, year over year.

Table 7-5 lists temporally recorded sales. You want to analyze same-quarter sales year by year from the data in the table. Each row represents a single sale. Note that this table might be a view of a more general table of individual sales that includes a date rather than a quarterly enumeration.

The PIVOT command, which we will look at shortly, can convert this temporal view of individual sales into a view that has years categorized by sales in a quarter. Table 7-6 shows this.

Presenting the data this way makes it much easier to analyze same-quarter sales. This table aggregates year rows for each given year in the previous table into a single row. However, the aggregated amounts are broken out into quarters rather than being aggregated over the entire year.

The other use of the PIVOT command is to implement an open schema. An open schema allows arbitrary attributes to be associated with an entity. For example, consider a hardware store; its entities are the products that it sells. Each product has a number of attributes used to describe it. One common attribute of all products it the name of the product.

The hardware store sells "Swish" brand paint that has attributes of quantity, color, and type. It also sells "AttachIt" fastener screws, and these have attributes of pitch and diameter. Over time, it expects to add many other products to its inventory. With this categorization "Swish, 1 qt, green, latex" would be one product or entity, and "Swish, 1qt, blue, oil" would be another.

A classic solution to designing the database the hardware store will use to maintain its inventory is to design a table per product. For example, a table named Swish with columns for quantity, color, and type. This, of course, requires products and their attributes to be known and for those attributes to remain constant over time. What happens if the manufacturer of the Swish paint adds a new attribute, "Drying Time", but only to certain colors of paint?

An alternate solution is to have only two tables, regardless of the number of products involved or the attributes they have. In the case of the hardware store, there would be a Product table and a Properties table. The Product table would have an entry per product, and the Properties table would contain the arbitrary attributes of that product. The properties of a product are linked to it via a foreign key. This is called an open schema. Figure 7-7 shows the two ways of designing tables to represent the inventory of the hardware store.

The PIVOT operator can easily convert data that is stored using an open schema to a view that looks the same as the table-per-product solution. Next, we will look at the details of using PIVOT to analyze data and support open schemas, and then how to use PIVOT to work with open schemas. There is also an UNPIVOT operator, which can be used to produce the original open schema format from previously pivoted results.

Using PIVOT for Analysis

In this example, we are going to use PIVOT to analyze the sales data we showed in an earlier table. To do this, we build a SALES table and populate it with data, as is shown in the following SQL batch.

CREATE TABLE SALES
(
[Year] INT,
Quarter CHAR(2),
Amount FLOAT
)
GO
INSERT INTO SALES VALUES (2001, 'Q2', 70)
INSERT INTO SALES VALUES (2001, 'Q3', 55)
INSERT INTO SALES VALUES (2001, 'Q3', 110)
INSERT INTO SALES VALUES (2001, 'Q4', 90)
INSERT INTO SALES VALUES (2002, 'Q1', 200)
INSERT INTO SALES VALUES (2002, 'Q2', 150)
INSERT INTO SALES VALUES (2002, 'Q2', 40)
INSERT INTO SALES VALUES (2002, 'Q2', 60)
INSERT INTO SALES VALUES (2002, 'Q3', 120)
INSERT INTO SALES VALUES (2002, 'Q3', 110)
INSERT INTO SALES VALUES (2002, 'Q4', 180)
GO

To get a view that is useful for quarter-over-year comparisons, we want to pivot the table's Quarter column into a row heading and aggregate the sum of the values in each quarter for a year. The SQL batch that follows shows a PIVOT command that does this.

SELECT * FROM SALES
PIVOT
(SUM (Amount) — Aggregate the Amount column using SUM
FOR [Quarter] — Pivot the Quarter column into column headings
IN (Q1, Q2, Q3, Q4)) — use these quarters
AS P
GO
Year  Q1      Q2      Q3      Q4
——- ——————- ——————- ——————- ——————
2001  100      190      165      90
2002  200      250      230      180

The SELECT statement selects all the rows from SALES. A PIVOT clause is added to the SELECT statement. It starts with the PIVOT keyword followed by its body enclosed in parentheses. The body contains two parts separated by the FOR keyword. The first part of the body specifies the kind of aggregation to be performed. The argument of the aggregate function must be a column name; it cannot be an expression as it is when an aggregate function is used outside a PIVOT. The second part specifies the pivot column—that is, the column to pivot into a row—and the values from that column to be used as column headings. The value for a particular column in a row is the aggregation of the column specified in the first part, over the rows that match the column heading.

Note that it is not required to use all the possible values of the pivot column. You only need to specify the Q2 column if you wish to analyze just the year-over-year Q2 results. The SQL batch that follows shows this.

SELECT * FROM SALES
PIVOT
(SUM (Amount)
FOR [Quarter] 
IN (Q2))
AS P
GO

Year    Q2
—————- ———————————
2001    190
2002    250

Figure 8Figure 7-8: Rotating Properties

Figure 9Figure 7-9: Basic PIVOT




Figure 10Figure 7-10: Results of Open Schema Pivot

Note that the output produced by the PIVOT clause acts as though SELECT has a GROUP BY [Year] clause. A pivot, in effect, applies a GROUP BY to the SELECT that includes all the columns that are not either the aggregate or the pivot column. This can lead to undesired results, as shown in the SQL batch that follows. It uses essentially the same SALES table as the previous example, except that it has an additional column named Other.

CREATE TABLE SALES2
(
[Year] INT,
Quarter CHAR(2),
Amount FLOAT,
Other INT
)
INSERT INTO SALES2 VALUES (2001, 'Q2', 70, 1)
INSERT INTO SALES2 VALUES (2001, 'Q3', 55, 1)
INSERT INTO SALES2 VALUES (2001, 'Q3', 110, 2)
INSERT INTO SALES2 VALUES (2001, 'Q4', 90, 1)
INSERT INTO SALES2 VALUES (2002, 'Q1', 200, 1)
INSERT INTO SALES2 VALUES (2002, 'Q2', 150, 1)
INSERT INTO SALES2 VALUES (2002, 'Q2', 40, 1)
INSERT INTO SALES2 VALUES (2002, 'Q2', 60, 1)
INSERT INTO SALES2 VALUES (2002, 'Q3', 120, 1)
INSERT INTO SALES2 VALUES (2002, 'Q3', 110, 1)
INSERT INTO SALES2 VALUES (2002, 'Q4', 180, 1)

SELECT * FROM Sales2
PIVOT
(SUM (Amount)
FOR Quarter 
IN (Q3))
AS P
GO
Year    Other    Q3
—————- —————- ———————————
2001    1      55
2002    1      115
2001    2      110

Note that the year 2001 appears twice, once for each value of Other. The SELECT that precedes the PIVOT keyword cannot specify which columns to use in the PIVOT clause. However, a subquery can be used to eliminate the columns not desired in the pivot, as shown in the SQL batch that follows.

SELECT * FROM
(Select Amount, Quarter, Year from Sales2
) AS A
PIVOT
(SUM (Amount)
FOR Quarter 
IN (Q3))
AS P
GO
Year    Q3
—————- ———————————
2001    165
2002    230

A column named in the FOR part of the PIVOT clause may not correspond to any values in the pivot column of the table. The column will be output, but will have null values. The following SQL batch shows this.

SELECT * FROM SALES
PIVOT
(SUM (Amount)
FOR [Quarter] 
IN (Q2, LastQ))
As P
GO
Year    Q2           LastQ
—————- ——————————— ———————————
2001    190          NULL
2002    250          NULL

Note that the Quarter column of the SALES table has no value "LastQ", so the output of the PIVOT lists all the values in the LastQ column as NULL.

Using PIVOT for Open Schemas

Using PIVOT for an open schema is really no different from using PIVOT for analysis, except that we don't depend on PIVOT's ability to aggregate a result. The open schema has two tables, a Product table and a Properties table, as was shown in Figure 7-7. What we want to do is to take selected rows from the Properties table and pivot them—that is, rotate them—and then add them as columns to the Product table. This is shown in Figure 7-8.

Figure 7-9 shows the PIVOT we will use to select the line from the Product table for "Swish" products and joint them with the corresponding pivoted lines from the Properties table.

This query selects row from the Properties table that have a string equal to "color", "type", or "amount" in the value column. They are selected from the value column because value is the argument of the MAX function that follows the PIVOT keyword. The strings "color", "type", and "amount" are used because they are specified as an argument of the IN clause after the FOR keyword. Note that the arguments of the IN clause must be literal; there is no way to calculate them—for example, by using a subquery.

The results of the pivot query in Figure 7-9 are shown in Figure 7-10.

Note that the columns that were selected from the Properties table now appear as rows in the output.

Ranking and Windowing Functions

SQL Server 2005 adds support for a group of functions known as ranking functions. At its simplest, ranking adds an extra value column to the resultset that is based on a ranking algorithm being applied to a column of the result. Four ranking functions are supported.

ROW_NUMBER() produces a column that contains a number that corresponds to the row's order in the set. The set must be ordered by using an OVER clause with an ORDER BY clause as a variable. The following is an example.

SELECT orderid, customerid, 
    ROW_NUMBER() OVER(ORDER BY orderid) AS num 
FROM orders
WHERE orderid < 10400
AND  customerid <= 'BN'

produces

orderid   customerid num
—————- ————— ——————————
10248    VINET   1
10249    TOMSP   2
10250    HANAR   3
10251    VICTE   4
10252    SUPRD   5
10253    HANAR   6
10254    CHOPS   7
10255    RICSU   8
... more rows

Note that if you apply the ROW_NUMBER function to a nonunique column, such as customerid in the preceding example, the order of customers with the same customerid (ties) is not defined. In any case, ROW_NUMBER produces a monotonically increasing number; that is, no rows will ever share a ROW_NUMBER.

SELECT orderid, customerid, 
    ROW_NUMBER() OVER(ORDER BY customerid) AS num 
FROM orders
WHERE orderid < 10400
AND  customerid <= 'BN'

produces

orderid   customerid num
—————- ————— ——————————
10308    ANATR   1
10365    ANTON   2
10355    AROUT   3
10383    AROUT   4
10384    BERGS   5
10278    BERGS   6
10280    BERGS   7
10265    BLONP   8
10297    BLONP   9
10360    BLONP   10

RANK() applies a monotonically increasing number for each value in the set. The value of ties, however, is the same. If the columns in the OVER(ORDER BY ) clause have unique values, the result produced by RANK() is identical to the result produced by ROW_NUMBER(). RANK() and ROW_NUMBER() differ only if there are ties. Here's the second earlier example using RANK().

SELECT orderid, customerid, 
    RANK() OVER(ORDER BY customerid) AS [rank] 
FROM orders
WHERE orderid < 10400
AND  customerid <= 'BN'

produces

orderid   customerid rank
—————- ————— ——————————
10308    ANATR   1
10365    ANTON   2
10355    AROUT   3
10383    AROUT   3
10384    BERGS   5
10278    BERGS   5
10280    BERGS   5
10265    BLONP   8
10297    BLONP   8
10360    BLONP   8
... more rows

Note that multiple rows have the same rank if their customerid is the same. There are holes, however, in the rank column value to reflect the ties. Using the DENSE_RANK() function works the same way as RANK() but gets rid of the holes in the numbering. NTILE(n) divides the resultset into "n" approximately even pieces and assigns each piece the same number. NTILE(100) would be the well-known (to students) percentile. The following query shows the difference between ROW_NUMBER(), RANK(), DENSE_RANK(), and TILE(n).

SELECT orderid, customerid, 
    ROW_NUMBER() OVER(ORDER BY customerid) AS num,
    RANK()    OVER(ORDER BY customerid) AS [rank],
    DENSE_RANK() OVER(ORDER BY customerid) AS [denserank],
    NTILE(5)   OVER(ORDER BY customerid) AS ntile5
FROM orders
WHERE orderid < 10400
AND  customerid <= 'BN'

produces

orderid   customerid num  rank  denserank ntile5
—————-   —————       ———  ———    ————-    ———
10308    ANATR        1    1       1       1
10365    ANTON        2    2       2       1
10355    AROUT        3    3       3       2
10383    AROUT        4    3       3       2
10278    BERGS        5    5       4       3
10280    BERGS        6    5       4       3
10384    BERGS        7    5       4       4
10265    BLONP        8    8       5       4
10297    BLONP        9    8       5       5
10360    BLONP       10    8       5       5

The ranking functions have additional functionality when combined with windowing functions. Windowing functions divide a resultset into partitions, based on the value of a PARTITION BY clause inside the OVER clause. The ranking functions are applied separately to each partition. Here's an example.

SELECT *,
 RANK() OVER(PARTITION BY COUNTRY ORDER BY age) AS [rank]
FROM
(
 SELECT lastname, country, 
  DATEDIFF(yy,birthdate,getdate())AS age
 FROM employees
) AS a

produces

lastname   country    age   rank
—————————— ———————- —————-  ———
Dodsworth    UK       37     1
Suyama       UK       40     2
King         UK       43     3
Buchanan     UK       48     4
Leverling    USA      40     1
Callahan     USA      45     2
Fuller       USA      51     3
Davolio      USA      55     4
Peacock      USA      66     5

There are separate rankings for each partition. An interesting thing to note about this example is that the subselect is required because any column used in a PARTITION BY or ORDER BY clause must be available from the columns in the FROM portion of the statement. In our case, the seemingly simpler statement that follows:

SELECT lastname, country,
 DATEDIFF(yy,birthdate,getdate())AS age,
 RANK() OVER(PARTITION BY COUNTRY ORDER BY age) AS [rank]
FROM employees

wouldn't work; instead, you'd get the error "Invalid column name 'age'". In addition, you can't use the ranking column in a WHERE clause, because it is evaluated after all the rows are selected, as shown next.

— 10 rows to a page, we want page 40

— this won't work
SELECT 
 ROW_NUMBER() OVER (ORDER BY customerid, requireddate) AS num,
 customerid, requireddate, orderid
FROM orders
WHERE num BETWEEN 400 AND 410

— this will
SELECT * FROM 
(
SELECT 
 ROW_NUMBER() OVER (ORDER BY customerid, requireddate) AS num,
 customerid, requireddate, orderid
FROM orders
) AS a
WHERE num BETWEEN 400 AND 410

Although the preceding case looks similar to selecting the entire resultset into a temporary table, with num as a derived identity column, and doing a SELECT of the temporary table, in some cases the engine will be able to accomplish this without the complete set of rows. Besides being usable in a SELECT clause, the ranking and windowing functions are also usable in the ORDER BY clause. This gets employees partitioned by country and ranked by age, and then sorted by rank.

SELECT *,
 RANK() OVER(PARTITION BY COUNTRY ORDER BY age)) AS [rank]
FROM
(
 SELECT lastname, country, 
  DATEDIFF(yy,birthdate,getdate())AS age
 FROM employees
) AS a
ORDER BY RANK() OVER(PARTITION BY COUNTRY ORDER BY age), COUNTRY

produces

lastname   country     age     rank
—————————— ———————- —————- ——————————
Dodsworth      UK      37     1
Leverling      USA     40     1
Suyama         UK      40     2
Callahan       USA     45     2
King           UK      43     3
Fuller         USA     51     3
Buchanan       UK      48     4
Davolio        USA     55     4
Peacock        USA     66     5

You can also use other aggregate functions (either system-defined aggregates or user-defined aggregates that you saw in Chapter 5) with the OVER clause. When it is used in concert with the partitioning functions, however, you get the same value for each partition. This is shown next.

— there is one oldest employee age for each country
SELECT *,
 RANK() OVER(PARTITION BY COUNTRY ORDER BY age) AS [rank],
 MAX(age) OVER(PARTITION BY COUNTRY) AS [oldest age in country]
FROM
(
 SELECT lastname, country, 
  DATEDIFF(yy,birthdate,getdate())AS age
 FROM employees
) AS a

produces

lastname   country   age    rank   oldest age
                            in country
—————————— ———————- —————-  ———    —————-
Dodsworth      UK     37     1     48
Suyama         UK     40     2     48
King           UK     43     3     48
Buchanan       UK     48     4     48
Leverling      USA    40     1     66
Callahan       USA    45     2     66
Fuller         USA    51     3     66
Davolio        USA    55     4     66
Peacock        USA    66     5     66
  • + Share This
  • 🔖 Save To Your Account