Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Transact-SQL: Joins

Last updated Mar 28, 2003.

In the last few articles in this series I’ve been explaining how to use Transact-SQL to access data in SQL Server databases. I’ve shown you how to do simple SELECT operations, limit the data that is returned, aggregate and group the results, and even how to do sub-selects in the SELECT, FROM and WHERE clauses.

But so far, with the exception one of the sub-select examples, I've limited the selections of data to one table at a time. While this is fine for the first few examples, the power of a relational database isn't in single tables — it's in combining tables to obtain new data sets. The real power of the relational design is only realized when you bring multiple tables together to show the data they contain as a group.

I explained in the database design tutorial how to define data you want to store in multiple tables. You define “key” fields that uniquely identify a particular row in a table, and those fields are the ones that you’ll use most often to “join” the tables back together. A relational database's design is created with that join in mind from the start. In fact, the keyword used to bring tables together is JOIN — but I’m getting ahead of myself.

Although creating effective joins is sometimes touted as an art rather than a science, it really isn't that difficult. Here's a simple, three-step formula for creating effective joins:

  1. Form a question that defines the data set you want to see
  2. Find the relevant tables and their joining fields
  3. Join one table at a time to the next

I’ll show you how to use this formula in a moment to examine joins a bit further, but first I’ll explain at the “Cartesian” or “Cross” Join. This join is very simple to create (especially by accident) although it's rarely used. The cross join is formed by joining every row of a table to every row of another table. Here's an example, using the pubs sample database:

USE pubs
FROM authors, titles

I won’t print the results here, since there are quite few rows! It takes every row from the first table (authors) and combines it with the first row of the second table (titles), and then does that again for the second row in titles, and so on.

The reason I’m covering this type of join is to illustrate the fact that joins show that this is using “set theory” — and that mathematical precepts explains the results. It’s rare to use this join type, but I’ve done it for statistical modeling before. It doesn’t use the formula I mentioned a moment ago — you simply take one table and don’t apply any common information between them to create this type of join.

Form a question that defines the data set you want to see

Now that I’ve explained the simplest type of join, I’ll take the formula and examine some of the more useful types. The first step states that I should create the question that defines the data I want. Here's one from that pubs database:

    "What are the names of the authors of each book, with their titles?"

Breaking this down a bit, you can see that the data fields I’m after are the first and last names of the authors, and the titles of the books they wrote. It’s pretty much this easy for every query — just write down what you want to know, and then start pulling out the nouns, just like you do when you design a database.

Find the relevant tables and their joining fields

The second step of the formula states that I need to find the tables that hold the data I’m looking for, and then to determine any fields that are used to link them together (if any). A quick note here — the tables have to have something in common to link them — if there isn’t, it’s difficult to join them, of course.

Most of the time, you’ll know the design of the database, and you’ll already know the tables you need and its fields. But even if you didn’t, if the design uses good practices, it’s not too hard to find. You can actually use the database diagram tool in SQL Server Management Studio to examine the relationships on the tables. While this method isn't always 100% accurate, it's a good place to start.

I already know a couple of the tables that contain at least some of the information I’m looking for. In the following diagram, I added the tables authors and titles to a database diagram, and I also checked the box that automatically adds related tables. I got the following display:

There are three tables involved with this information, even though I only want information about the title and the author. Why is this?

When one table can be related to another multiple times in each direction, a third table is employed to break the many-to-many relationship into two one-to-many relationships. In this case, one author (found in the authors table) can write many books (found in the titles table), and a book can have many authors. This creates the many-to-many situation, and the titleauthor table resolves the problem by adding rows from each table only once.

Now, back to step two of the formula. I have three tables involved in the information, and it looks like the fields I’m after are au_fname and au_lname from the authors table and title from the titles table. And just how did I know that? Well, the simplest way is to explore the data in the tables, being careful to use the TOP command on the select statement so that I only return a few rows. (There's no reason to bog down the database with thousands of rows of data if all I want is a few records to show me the kinds of info I need!)

You can also see that the two tables don't have a field between them that binds them together, but the diagram I created shows that they are related to rows in the titleauthor table. The rows that bind them are:

In the titles table, the field title_id

In the authors table, the field au_id

In the titleauthor table, the fields title_id and au_id

I now have all the information I need for step two, and can move on to step three.

Join one table at a time to the next

In step three I'll create the SQL Syntax that brings me the proper result. Let's take this one line at a time. First, I know that I want au_fname, au_lname and title:

SELECT au_fname, au_lname, title

So far, so good. I also know that the au_fname and au_lname fields are stored in the authors table:

FROM authors

And now I hit my first issue. The title field is stored in the titles table, and just adding a line like this:

FROM authors, titles

Will not only not work, but even if it did would produce that dreaded Cartesian product.

This is where the join syntax comes in. The general format is:

SELECT something
FROM Table 1
  ON Table 1.keyfield (operator) Table 2.keyfield

Study that syntax carefully. What I’m doing is building a set of data on the left-hand side, and then tying it to another set of data on the right-hand side with a condition.

Using that information, I’ll add in the table I know we need to join everything, titleauthor:

SELECT au_fname, au_lname, title
FROM authors
INNER JOIN titleauthor
  ON authors.au_id = titleauthor.au_id

I’ve told SQL Server to provide data from the first table, authors, where there are rows in the second table, titleauthor, that have the field au_id in common (=). Not that the operator could be things such as not equals or greater than, depending on the question I’m trying to answer.

I’m not finished yet, though — I still have to find the title field. Knowing that the titles will have rows in the titleauthor table as well, I include that table in yet another join:

SELECT au_fname, au_lname, title
FROM authors
INNER JOIN titleauthor
ON authors.au_id = titleauthor.au_id
  INNER JOIN titles
  ON titles.title_id = titleauthor.title_id

And there you have it.

Well, almost. Recall that several authors could write a book, so we probably need to use our sorting syntax to put them all together. This isn't required, but it does help answer the question more clearly that I formed in step one:

SELECT au_fname, au_lname, title
FROM authors
INNER JOIN titleauthor
ON authors.au_id = titleauthor.au_id
  INNER JOIN titles
  ON titles.title_id = titleauthor.title_id
ORDER BY title

Using the simple three-step formula gives me the first useful join, the INNER Join. This join is also sometimes called an “Equi-Join,” since it shows results where tables are combined based on a common column.

The next type of join follows the same process, but answers a different question. I’ll change the question in step one:

    "Which books do we have in stores, including those which haven't sold yet?"

To answer this question, I need all of the information in at least one of the tables, whether or not it has entries in another. I need all of the titles, and all the sales of titles.

Step two of the process says that I need to understand the data. After examining the tables, I see that two are involved in this design: titles, which has the titles of all books, and sales, which shows the books that have already sold. I see that the two tables can be related by the title_Id field, which is found in both. I now have enough information to move on to step three, designing the SQL Syntax.

The join I'll use to answer the question this time is called an OUTER Join. This type of join combines all the rows of one table with matching rows from another table.

There are two kinds of Outer Joins, a RIGHT OUTER Join and a LEFT OUTER Join. The tables are read left to right (or really the first to the next), so the LEFT OUTER join uses the first table I include as the one I want all the rows from. That means that I need to select from the titles table first, and then include the sales table as a join.

Since this join will produce records from one side that have no matches in the other, the system uses the value NULL as a placeholder. Here's the syntax:

SELECT title, qty 
FROM titles
  ON titles.title_id = sales.title_id

If you run this query in the pubs database, you’ll see that two books haven't sold yet.

By the way, a RIGHT OUTER Join wouldn't make a lot of sense here, since it would include all the sales of books that I don't have! But the RIGHT OUTER join works in the same way as the LEFT OUTER, it just uses the right (or the last included) table as a “all of these” condition.

There's one final join type I need to cover, the Self Join. This isn’t a special syntax; it’s just a way of performing the join. There are times that we might run across a table that stores information that relates to itself. Take a look at this simple table:

PersonNumber Name ParentNumber
1 Jo Anna 0
2 Buck 1
3 Catherine Anne 1

This table shows a small family, and the people that they are related to. Using the ParentNumber field, notice that Buck and Catherine Anne are both the child of Jo Anna.

I explained in the last article how to alias a table, or give it a letter to stand for its name. I'll use that technique here to create another copy of this table, and then use the same methodology as before to create my join.

First, the question:

    "To whom are the people in the database related?"

Second, I examine the table. I have a number that identifies a person, and another number that shows the parent relationship. I can't get at the data directly, since I need a join. But I only have one table — what I need is another table.

That brings me to step three. First, I know that we need the name field:

SELECT b.Name as 'Parent', a.Name as 'Child'
FROM Test a

Notice that I've aliased the name field since I'll use it twice. Next I need to join the table, but in this case, to itself — Notice that the Test table is used twice, just with two different letters as an alias:

SELECT b.Name as 'Parent', a.Name as 'Child'
FROM Test a
  ON a.ParentNumber = b.PersonNumber

And here is the result:

Parent Child
Jo Anna Buck
Jo Anna Catherine Anne

This type of join is used all the time for employee data, where a record represents both an employee and a manager of other employees.

Joins aren't that tough at all — it just takes a little practice. In future tutorials, we'll use these concepts all the time, so having them down cold is important. Memorize the process and you'll have no trouble at all.

InformIT Articles and Sample Chapters

Joining Tables in SQL Queries, a sample chapter by Ron Plew and Ryan Stephens, teaches you how to join multiple tables in a single query to optimize the use of the data stored in the database.

Books and eBooks

Love this one: Sams Teach Yourself SQL in One Hour a Day, 5th Edition, by Ryan Stephens, Ron Plew, and Arie D. Jones.

Online Resources

This DevShed article deals with MySQL, but the Join explanation is very good. Check it out!