Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Transact-SQL: Subqueries

Last updated Mar 28, 2003.

So far, in covering the process to select data from a database, we've learned to select data, limit the data based on a condition, and group and arrange the data, as well as perform aggregate functions. Here, I'll continue that subject with a new method to apply a condition on the search: the subquery.

A subquery, at its simplest, is just a query in the predicate of another query. Just as with all the constructs we've learned so far, however, simple constructs can form layers that comprise very complicated queries!

Before we get started, we need to clear up some verbiage again, just as we did with the word "query." We've seen, in other tutorials, that sometimes the words used in discussing databases are very specific, and at other times these terms are used interchangeably. To be technical about today's topic, a "subquery" is a selection inside a SELECT statement. A "subselect" is a selection inside an INSERT, UPDATE or DELETE. (More about those later.) In this tutorial, I'll use the term subquery no matter which of these operations we're discussing.

The basic premise to the subquery is quite simple. Subqueries replace various parts of one query with another. To help us through this discussion, we'll look at some concrete examples.

Let's start with the place that the subquery is most often found: in the WHERE section of the selection. Here's a simple query to get us started:

USE pubs
GO
SELECT au_id, au_fname, au_lname, city
FROM authors

Now let's use a subquery to return just the information where the author's state is California. "But wait," you say, "we already know how to do this with a simple WHERE statement." That's true, but to demonstrate this concept, we're going to use a subquery to do the same thing. Here it is:

SELECT au_id, au_fname, au_lname, city
FROM authors
WHERE state IN ( SELECT state 
   FROM authors
   WHERE state = 'CA' )

And yes, this is equivalent to:

SELECT au_id, au_fname, au_lname, city
FROM authors
WHERE state = 'CA' 

The reason to look at such a simple example is to show the format and use of the subquery.

The first important thing to notice is the use of the parentheses. Removing them from the statement above produces this error:

Server: Msg 156, Level 15, State 1, Line 3
Incorrect syntax near the keyword 'SELECT'.

The next part of a subquery to pay attention to is the order of evaluation. The WHERE query is evaluated first, which then builds the set that the first query (called the outer query) uses as a condition on the WHERE clause. (If that didn't make sense, read it one more time!)

That example was pretty basic. Let's extend it to do something new.

Subqueries can return any legal value – meaning that as long as we get the same data type to compare in the WHERE predicate the query will work.

Also, the subquery doesn't have to work within the same table as the outer query. That means that we can access another table with the second query, and use the results for the condition of the first query.

As a practical example, let's look up all the first and last names of the authors who have published a book.

If we examine the authors table, we see that the publishing information isn't stored there. The pubs database conforms to at least third normal form (see the earlier tutorials about database design) and so the tables are spread out such that repeating data isn't kept together. This means that we'll need to look in a different table (other than authors) to get the published books data, which we find in the titleauthor table. On further examination, we also find that the column that brings the two tables together is au_id.

So using all this information, we need to create a query that finds all the au_id's that have published books, this time from the titleauthor table:

SELECT au_id
FROM titleauthor

This query returns the entire set of author IDs from that table. Since this table only stores data if the author has published a book, this is the set of data to use as a limiter for our first table.

Now let's use that set of data to find the author's first and last names in the authors table:

SELECT au_id, au_fname, au_lname
FROM authors
WHERE au_id IN ( SELECT au_id
    FROM titleauthor )

And there we have it.

Remember from our previous tutorials that we can also use the NOT operators to filter sets. So, this query would show us the authors who haven't been published yet:

SELECT au_id, au_fname, au_lname
FROM authors
WHERE au_id NOT IN ( SELECT au_id
    FROM titleauthor )

We can also use a subquery in the FROM part of the outer query, rather than just in the WHERE section.

For this example, we'll use our layering technique to see the sales of books by store. We'll dissect the query after we run it:

SELECT a.title, COUNT(b.stor_id) 
FROM titles a, (SELECT title_id, stor_id FROM sales) b
WHERE a.title_id = b.title_id
GROUP BY a.title
ORDER BY COUNT(b.stor_id) DESC

Here's the output:

Is Anger the Enemy?

4

The Busy Executive's Database Guide

2

The Gourmet Microwave

2

You Can Combat Computer Stress!

1

But Is It User Friendly?

1

Computer Phobic AND Non-Phobic Individuals: Behavior Variations

1

Cooking with Computers: Surreptitious Balance Sheets

1

Emotional Security: A New Algorithm

1

Fifty Years in Buckingham Palace Kitchens

1

Life Without Fear

1

Onions, Leeks, and Garlic: Cooking Secrets of the Mediterranean

1

Prolonged Data Deprivation: Four Case Studies

1

Secrets of Silicon Valley

1

Silicon Valley Gastronomic Treats

1

Straight Talk About Computers

1

Sushi, Anyone?

1


There's a lot going on in these four lines, so let's take it a bit at a time.

Take a look at the subquery on the second line: SELECT title_id, stor_id FROM sales). Notice that all we're doing here is getting two pieces of information from the sales table: the title_ids and the stor_ids.

Now, look at the first line of the outer query. Its structure is a bit different than what we've seen before, because there are letters in front of the field names. These letters are called an alias, and tell the query which table the information comes from.

We don't have to use a letter as an alias. We could have spelled out the whole table name (titles.title_id and sales.title_id), but the letters are certainly easier to type.

The reason we haven't seen this construct before now is that this is the first time we've selected two columns from different tables. We're asking for the title from the first table, and the count of the store IDs from the second table. In future tutorials, we'll learn much more about accessing data from several tables at once, but for now we can focus on this method.

In the second line of this query we see the subquery in use. First, we see the titles table and the letter "a" after it. This is how we set up the alias we used in the previous line. Second, we see the subquery asking for the information we need to get the count of stores. Notice also the "b" letter, aliasing the entire subquery.

In the third line we're bringing this all together: the a.title_id = b.title_id WHERE statement. This is the same type of statement we've used in earlier tutorials; it's just that we now include the other table as a limiting condition. It limits the returned sets to those where the two tables have the same values.

Finally, in line four we use the aggregate functions (the ones we learned about last time) to show the number of stores where the books were sold.

So we've seen that the subquery can be used in the WHERE section, in the FROM area, but it can also even be used even in the SELECT part of a query!

Here's a query that shows the author IDs for multiple-author books. Examine it and look for the concepts we've seen so far:

SELECT DISTINCT title_id,
 (SELECT au_id
 FROM titleauthor
 WHERE au_ord = 1 AND title_id = a.title_id),
 (SELECT au_id
 FROM titleauthor
 WHERE au_ord = 2 and title_id = a.title_id),
 (SELECT au_id
 FROM titleauthor
 WHERE au_ord = 3 and title_id = a.title_id)
FROM titleauthor a

Here is the result of that query:

BU1032

409-56-7008

213-46-8915

NULL

BU1111

724-80-9391

267-41-2394

NULL

BU2075

213-46-8915

NULL

NULL

BU7832

274-80-9391

NULL

NULL

MC2222

712-45-1867

NULL

NULL

MC3021

722-51-5454

899-46-2035

NULL

PC1035

238-95-7766

NULL

NULL

PC8888

427-17-2319

846-92-7186

NULL

PC9999

486-29-1786

NULL

NULL

PS1372

756-30-7391

724-80-9391

NULL

PS2091

998-72-3567

899-46-2035

NULL

PS2106

998-72-3567

NULL

NULL

PS3333

172-32-1176

NULL

NULL

PS7777

486-29-1786

NULL

NULL

TC3218

807-91-6654

NULL

NULL

TC4203

648-92-1872

NULL

NULL

TC7777

672-71-3249

267-41-2394

472-27-2349


The only thing different with this query is that the SELECT values include a subquery. Another construct that might be new is the use of a "self-referencing" query. You might notice that we're only using one table, but we aliased it anyway. That's because we want to compare sets of data from the same table as if there were two tables. By aliasing the table, we can reference it as if there was a duplicate table!

Once again, we've seen a fairly simple concept that has amazing implications. One caveat is important to note: there are performance implications with subqueries. If set up incorrectly, some queries must run for each row that is processed in the outer query.

We'll use the subquery more often with INSERTS, UPDATES, and DELETES when we learn about those in future tutorials.

Online Resources

Although this site deals with Paradox, Lawrence G. DiGiovanni has a good tutorial on subqueries here.

Brian Kautz has an tutorial on MC Press called Of Shoes and SQL Subqueries that has a bit more advanced information on subqueries.

I found this great link that shows the use of the various predicates for subselects. It's from Marc Grange and the link is called Applications of Databases to Humanities and Social Sciences.

InformIT Tutorials and Sample Chapters

Judith S. Bowman has a good tutorial called Practical SQL: Subqueries in FROM and SELECT Clauses. In it she explains the ANSI subquery syntax.