Table of Contents
- Microsoft SQL Server Defined
- Microsoft SQL Server Features
- Microsoft SQL Server Administration
Microsoft SQL Server Programming
- An Outline for Development
- Database Services
- Database Objects: Databases
- Database Objects: Tables
- Database Objects: Table Relationships
- Database Objects: Keys
- Database Objects: Constraints
- Database Objects: Data Types
- Database Objects: Views
- Database Objects: Stored Procedures
- Database Objects: Indexes
- Database Objects: User Defined Functions
- Database Objects: Triggers
- Database Design: Requirements, Entities, and Attributes
- Business Process Model Notation (BPMN) and the Data Professional
- Business Questions for Database Design, Part One
- Business Questions for Database Design, Part Two
- Database Design: Finalizing Requirements and Defining Relationships
- Database Design: Creating an Entity Relationship Diagram
- Database Design: The Logical ERD
- Database Design: Adjusting The Model
- Database Design: Normalizing the Model
- Creating The Physical Model
- Database Design: Changing Attributes to Columns
- Database Design: Creating The Physical Database
- Database Design Example: Curriculum Vitae
- The SQL Server Sample Databases
- The SQL Server Sample Databases: pubs
- The SQL Server Sample Databases: NorthWind
- The SQL Server Sample Databases: AdventureWorks
- The SQL Server Sample Databases: Adventureworks Derivatives
- UniversalDB: The Demo and Testing Database, Part 1
- UniversalDB: The Demo and Testing Database, Part 2
- UniversalDB: The Demo and Testing Database, Part 3
- UniversalDB: The Demo and Testing Database, Part 4
- Getting Started with Transact-SQL
- Transact-SQL: Data Definition Language (DDL) Basics
- Transact-SQL: Limiting Results
- Transact-SQL: More Operators
- Transact-SQL: Ordering and Aggregating Data
- Transact-SQL: Subqueries
- Transact-SQL: Joins
- Transact-SQL: Complex Joins - Building a View with Multiple JOINs
- Transact-SQL: Inserts, Updates, and Deletes
- An Introduction to the CLR in SQL Server 2005
- Design Elements Part 1: Programming Flow Overview, Code Format and Commenting your Code
- Design Elements Part 2: Controlling SQL's Scope
- Design Elements Part 3: Error Handling
- Design Elements Part 4: Variables
- Design Elements Part 5: Where Does The Code Live?
- Design Elements Part 6: Math Operators and Functions
- Design Elements Part 7: Statistical Functions
- Design Elements Part 8: Summarization Statistical Algorithms
- Design Elements Part 9:Representing Data with Statistical Algorithms
- Design Elements Part 10: Interpreting the Data—Regression
- Design Elements Part 11: String Manipulation
- Design Elements Part 12: Loops
- Design Elements Part 13: Recursion
- Design Elements Part 14: Arrays
- Design Elements Part 15: Event-Driven Programming Vs. Scheduled Processes
- Design Elements Part 16: Event-Driven Programming
- Design Elements Part 17: Program Flow
- Forming Queries Part 1: Design
- Forming Queries Part 2: Query Basics
- Forming Queries Part 3: Query Optimization
- Forming Queries Part 4: SET Options
- Forming Queries Part 5: Table Optimization Hints
- Using SQL Server Templates
- Transact-SQL Unit Testing
- Index Tuning Wizard
- Unicode and SQL Server
- SQL Server Development Tools
- The SQL Server Transact-SQL Debugger
- The Transact-SQL Debugger, Part 2
- Basic Troubleshooting for Transact-SQL Code
- An Introduction to Spatial Data in SQL Server 2008
- Performance Tuning
- Practical Applications
- Professional Development
- Application Architecture Assessments
- Business Intelligence
- Tips and Troubleshooting
- Additional Resources
Last updated Mar 28, 2003.
In the last few articles in this series I’ve been explaining how to use Transact-SQL to access data in SQL Server databases. I’ve shown you how to do simple SELECT operations, limit the data that is returned, aggregate and group the results, and even how to do sub-selects in the SELECT, FROM and WHERE clauses.
But so far, with the exception one of the sub-select examples, I've limited the selections of data to one table at a time. While this is fine for the first few examples, the power of a relational database isn't in single tables — it's in combining tables to obtain new data sets. The real power of the relational design is only realized when you bring multiple tables together to show the data they contain as a group.
I explained in the database design tutorial how to define data you want to store in multiple tables. You define “key” fields that uniquely identify a particular row in a table, and those fields are the ones that you’ll use most often to “join” the tables back together. A relational database's design is created with that join in mind from the start. In fact, the keyword used to bring tables together is JOIN — but I’m getting ahead of myself.
Although creating effective joins is sometimes touted as an art rather than a science, it really isn't that difficult. Here's a simple, three-step formula for creating effective joins:
- Form a question that defines the data set you want to see
- Find the relevant tables and their joining fields
- Join one table at a time to the next
I’ll show you how to use this formula in a moment to examine joins a bit further, but first I’ll explain at the “Cartesian” or “Cross” Join. This join is very simple to create (especially by accident) although it's rarely used. The cross join is formed by joining every row of a table to every row of another table. Here's an example, using the pubs sample database:
USE pubs GO SELECT * FROM authors, titles
I won’t print the results here, since there are quite few rows! It takes every row from the first table (authors) and combines it with the first row of the second table (titles), and then does that again for the second row in titles, and so on.
The reason I’m covering this type of join is to illustrate the fact that joins show that this is using “set theory” — and that mathematical precepts explains the results. It’s rare to use this join type, but I’ve done it for statistical modeling before. It doesn’t use the formula I mentioned a moment ago — you simply take one table and don’t apply any common information between them to create this type of join.
Form a question that defines the data set you want to see
Now that I’ve explained the simplest type of join, I’ll take the formula and examine some of the more useful types. The first step states that I should create the question that defines the data I want. Here's one from that pubs database:
"What are the names of the authors of each book, with their titles?"
Breaking this down a bit, you can see that the data fields I’m after are the first and last names of the authors, and the titles of the books they wrote. It’s pretty much this easy for every query — just write down what you want to know, and then start pulling out the nouns, just like you do when you design a database.
Find the relevant tables and their joining fields
The second step of the formula states that I need to find the tables that hold the data I’m looking for, and then to determine any fields that are used to link them together (if any). A quick note here — the tables have to have something in common to link them — if there isn’t, it’s difficult to join them, of course.
Most of the time, you’ll know the design of the database, and you’ll already know the tables you need and its fields. But even if you didn’t, if the design uses good practices, it’s not too hard to find. You can actually use the database diagram tool in SQL Server Management Studio to examine the relationships on the tables. While this method isn't always 100% accurate, it's a good place to start.
I already know a couple of the tables that contain at least some of the information I’m looking for. In the following diagram, I added the tables authors and titles to a database diagram, and I also checked the box that automatically adds related tables. I got the following display:
There are three tables involved with this information, even though I only want information about the title and the author. Why is this?
When one table can be related to another multiple times in each direction, a third table is employed to break the many-to-many relationship into two one-to-many relationships. In this case, one author (found in the authors table) can write many books (found in the titles table), and a book can have many authors. This creates the many-to-many situation, and the titleauthor table resolves the problem by adding rows from each table only once.
Now, back to step two of the formula. I have three tables involved in the information, and it looks like the fields I’m after are au_fname and au_lname from the authors table and title from the titles table. And just how did I know that? Well, the simplest way is to explore the data in the tables, being careful to use the TOP command on the select statement so that I only return a few rows. (There's no reason to bog down the database with thousands of rows of data if all I want is a few records to show me the kinds of info I need!)
You can also see that the two tables don't have a field between them that binds them together, but the diagram I created shows that they are related to rows in the titleauthor table. The rows that bind them are:
In the titles table, the field title_id
In the authors table, the field au_id
In the titleauthor table, the fields title_id and au_id
I now have all the information I need for step two, and can move on to step three.
Join one table at a time to the next
In step three I'll create the SQL Syntax that brings me the proper result. Let's take this one line at a time. First, I know that I want au_fname, au_lname and title:
SELECT au_fname, au_lname, title
So far, so good. I also know that the au_fname and au_lname fields are stored in the authors table:
And now I hit my first issue. The title field is stored in the titles table, and just adding a line like this:
FROM authors, titles
Will not only not work, but even if it did would produce that dreaded Cartesian product.
This is where the join syntax comes in. The general format is:
SELECT something FROM Table 1 INNER JOIN Table 2 ON Table 1.keyfield (operator) Table 2.keyfield
Study that syntax carefully. What I’m doing is building a set of data on the left-hand side, and then tying it to another set of data on the right-hand side with a condition.
Using that information, I’ll add in the table I know we need to join everything, titleauthor:
SELECT au_fname, au_lname, title FROM authors INNER JOIN titleauthor ON authors.au_id = titleauthor.au_id
I’ve told SQL Server to provide data from the first table, authors, where there are rows in the second table, titleauthor, that have the field au_id in common (=). Not that the operator could be things such as not equals or greater than, depending on the question I’m trying to answer.
I’m not finished yet, though — I still have to find the title field. Knowing that the titles will have rows in the titleauthor table as well, I include that table in yet another join:
SELECT au_fname, au_lname, title FROM authors INNER JOIN titleauthor ON authors.au_id = titleauthor.au_id INNER JOIN titles ON titles.title_id = titleauthor.title_id
And there you have it.
Well, almost. Recall that several authors could write a book, so we probably need to use our sorting syntax to put them all together. This isn't required, but it does help answer the question more clearly that I formed in step one:
SELECT au_fname, au_lname, title FROM authors INNER JOIN titleauthor ON authors.au_id = titleauthor.au_id INNER JOIN titles ON titles.title_id = titleauthor.title_id ORDER BY title
Using the simple three-step formula gives me the first useful join, the INNER Join. This join is also sometimes called an “Equi-Join,” since it shows results where tables are combined based on a common column.
The next type of join follows the same process, but answers a different question. I’ll change the question in step one:
"Which books do we have in stores, including those which haven't sold yet?"
To answer this question, I need all of the information in at least one of the tables, whether or not it has entries in another. I need all of the titles, and all the sales of titles.
Step two of the process says that I need to understand the data. After examining the tables, I see that two are involved in this design: titles, which has the titles of all books, and sales, which shows the books that have already sold. I see that the two tables can be related by the title_Id field, which is found in both. I now have enough information to move on to step three, designing the SQL Syntax.
The join I'll use to answer the question this time is called an OUTER Join. This type of join combines all the rows of one table with matching rows from another table.
There are two kinds of Outer Joins, a RIGHT OUTER Join and a LEFT OUTER Join. The tables are read left to right (or really the first to the next), so the LEFT OUTER join uses the first table I include as the one I want all the rows from. That means that I need to select from the titles table first, and then include the sales table as a join.
Since this join will produce records from one side that have no matches in the other, the system uses the value NULL as a placeholder. Here's the syntax:
SELECT title, qty FROM titles LEFT OUTER JOIN sales ON titles.title_id = sales.title_id ORDER BY qty
If you run this query in the pubs database, you’ll see that two books haven't sold yet.
By the way, a RIGHT OUTER Join wouldn't make a lot of sense here, since it would include all the sales of books that I don't have! But the RIGHT OUTER join works in the same way as the LEFT OUTER, it just uses the right (or the last included) table as a “all of these” condition.
There's one final join type I need to cover, the Self Join. This isn’t a special syntax; it’s just a way of performing the join. There are times that we might run across a table that stores information that relates to itself. Take a look at this simple table:
This table shows a small family, and the people that they are related to. Using the ParentNumber field, notice that Buck and Catherine Anne are both the child of Jo Anna.
I explained in the last article how to alias a table, or give it a letter to stand for its name. I'll use that technique here to create another copy of this table, and then use the same methodology as before to create my join.
First, the question:
"To whom are the people in the database related?"
Second, I examine the table. I have a number that identifies a person, and another number that shows the parent relationship. I can't get at the data directly, since I need a join. But I only have one table — what I need is another table.
That brings me to step three. First, I know that we need the name field:
SELECT b.Name as 'Parent', a.Name as 'Child' FROM Test a
Notice that I've aliased the name field since I'll use it twice. Next I need to join the table, but in this case, to itself — Notice that the Test table is used twice, just with two different letters as an alias:
SELECT b.Name as 'Parent', a.Name as 'Child' FROM Test a INNER JOIN Test b ON a.ParentNumber = b.PersonNumber
And here is the result:
|Jo Anna||Catherine Anne|
This type of join is used all the time for employee data, where a record represents both an employee and a manager of other employees.
Joins aren't that tough at all — it just takes a little practice. In future tutorials, we'll use these concepts all the time, so having them down cold is important. Memorize the process and you'll have no trouble at all.
InformIT Articles and Sample Chapters
Joining Tables in SQL Queries, a sample chapter by Ron Plew and Ryan Stephens, teaches you how to join multiple tables in a single query to optimize the use of the data stored in the database.
Books and eBooksLove this one: Sams Teach Yourself SQL in One Hour a Day, 5th Edition, by Ryan Stephens, Ron Plew, and Arie D. Jones.
This DevShed article deals with MySQL, but the Join explanation is very good. Check it out!