Home > Articles

  • Print
  • + Share This
This chapter is from the book

Item 21: Use UNION Statements to “Unpivot” Non-normalized Data

You saw in Item 3, “Get rid of repeating groups,” how UNION queries can be used to deal with repeating groups. We explore UNION queries a little bit more in this item. As you will learn in Item 22, “Understand relational algebra and how it is implemented in SQL,” the Union operation is one of the eight relational algebra operations that can be performed within the relational model defined by Dr. Edgar F. Codd. It is used to merge data sets created by two (or more) SELECT statements.

Assume that the only way you are able to get some data for analysis is in the form of the Excel spreadsheet pictured in Figure 3.5, which is obviously not normalized.

Figure 3.5

Figure 3.5 Non-normalized data from Excel

Assuming you can import that data into your DBMS, at best you will end up with a table (SalesSummary) that has five pairs of repeating groups, which we will call OctQuantity, OctSales, NovQuantity, NovSales, and so on to FebQuantity and FebSales.

Listing 3.8 shows a query that would let you look at the October data.

Listing 3.8 SQL to extract October data

SELECT Category, OctQuantity, OctSales
FROM SalesSummary;

Of course, to look at the data for a different month, you need a different query. And let’s not forget that data that is not normalized can be more difficult to use for analysis purposes. This is where a UNION query can help.

There are three basic rules that apply when using UNION queries:

  1. There must be the same number of columns in each of the queries making up the UNION query.

  2. The order of the columns in each of the queries making up the UNION query must be the same.

  3. The data types of the columns in each of the queries must be compatible.

Note that there is nothing in those rules about the names of the columns in the queries that make up the UNION query.

Listing 3.9 shows how to combine all of the data into a normalized view.

Listing 3.9 Using UNION to normalize the data

SELECT Category, OctQuantity, OctSales
FROM SalesSummary
UNION
SELECT Category, NovQuantity, NovSales
FROM SalesSummary
UNION
SELECT Category, DecQuantity, DecSales
FROM SalesSummary
UNION
SELECT Category, JanQuantity, JanSales
FROM SalesSummary
UNION
SELECT Category, FebQuantity, FebSales
FROM SalesSummary;

Table 3.2 shows a partial extract of the data returned.

Table 3.2 Partial extract of data returned by the UNION query in Listing 3.9

Category

OctQuantity

OctSales

Accessories

923

60883.03

Accessories

930

61165.40

. . .

. . .

. . .

Bikes

450

585130.50

Bikes

542

705733.50

Car racks

96

16772.05

Car racks

115

20137.05

Car racks

124

21763.30

. . .

. . .

. . .

Skateboards

203

89040.58

Skateboards

204

79461.30

Tires

110

3081.24

Tires

137

3937.70

Tires

150

4388.55

Tires

151

4356.91

Tires

186

5377.60

Two things should stand out. First, there is no way to distinguish to which month the data applies. The first two rows, for instance, represent the quantity and sales amount for Accessories for October and November, but there is no way to tell that from the data. As well, despite the fact that the data represents five months of sales, the columns are named OctQuantity and OctSales. That is because UNION queries get their column names from the names of the columns in the first SELECT statement.

Listing 3.10 shows a query that remedies both of those issues.

Listing 3.10 Tidying up the UNION query used to normalize the data

SELECT Category, 'Oct' AS SalesMonth, OctQuantity AS Quantity,
  OctSales AS SalesAmt
FROM SalesSummary
UNION
SELECT Category, 'Nov', NovQuantity, NovSales
FROM SalesSummary
UNION
SELECT Category, 'Dec', DecQuantity, DecSales
FROM SalesSummary
UNION
SELECT Category, 'Jan', JanQuantity, JanSales
FROM SalesSummary
UNION
SELECT Category, 'Feb', FebQuantity, FebSales
FROM SalesSummary;

Table 3.3 shows the same partial extract returned by the query in Listing 3.10.

Table 3.3 Partial extract of data returned by the UNION query in Listing 3.10

Category

SalesMonth

Quantity

SalesAmount

Accessories

Dec

987

62758.14

Accessories

Feb

979

60242.47

. . .

. . .

. . .

. . .

Bikes

Nov

412

546657.00

Bikes

Oct

413

536590.50

Car racks

Dec

115

20137.05

Car racks

Feb

124

21763.30

Car racks

Jan

142

24794.75

. . .

. . .

. . .

. . .

Skateboards

Nov

203

89040.58

Skateboards

Oct

164

60530.06

Tires

Dec

150

4388.55

Tires

Feb

137

3937.70

Tires

Jan

186

5377.60

Tires

Nov

110

3081.24

Tires

Oct

151

4356.91

Should you want the data presented in a different sequence, the ORDER BY clause must appear after the last SELECT in the UNION query, as shown in Listing 3.11 on the next page.

Listing 3.11 Specifying the sort order of the UNION query

SELECT Category, 'Oct' AS SalesMonth, OctQuantity AS Quantity,
  OctSales AS SalesAmt
FROM SalesSummary
UNION
SELECT Category, 'Nov', NovQuantity, NovSales
FROM SalesSummary
UNION
SELECT Category, 'Dec', DecQuantity, DecSales
FROM SalesSummary
UNION
SELECT Category, 'Jan', JanQuantity, JanSales
FROM SalesSummary
UNION
SELECT Category, 'Feb', FebQuantity, FebSales
FROM SalesSummary
ORDER BY SalesMonth, Category;

Table 3.4 shows a partial extract returned by the query in Listing 3.11.

Table 3.4 Partial extract of data returned by the UNION query in Listing 3.11

Category

SalesMonth

Quantity

SalesAmount

Accessories

Dec

987

62758.14

Bikes

Dec

332

439831.50

Car racks

Dec

115

20137.05

Clothing

Dec

139

4937.74

Components

Dec

265

27480.22

Skateboards

Dec

129

59377.20

Tires

Dec

150

4388.55

Accessories

Feb

979

60242.47

Bikes

Feb

450

585130.50

Car racks

Feb

124

21763.30

. . .

. . .

. . .

. . .

Another consideration is that UNION queries eliminate any duplicate rows. Should this not be what you want, you can specify UNION ALL instead of UNION, and duplicates will not be eliminated. On the other hand, UNION ALL can provide performance improvements as it skips the step of deduplicating the result set, so if you know that the sources will not overlap, it can be advantageous to specify UNION ALL for those queries.

Things to Remember

  • Each of the SELECT statements in the UNION query must have the same number of columns.

  • Although the names of the columns in the various SELECT statements do not matter, the data types of each column must be compatible.

  • To control the order in which the data appears, you can use an ORDER BY clause after the last SELECT statement.

  • Use UNION ALL rather than UNION if you do not wish to eliminate duplicate rows or pay the performance penalty of deduplicating rows.

  • + Share This
  • 🔖 Save To Your Account