Intersection
No, it’s not your local street corner. An intersection of two sets contains the common elements of two sets. Let’s first take a look at an intersection from the pure perspective of set theory and then see how you can use an intersection to solve business problems.
Intersection in Set Theory
An intersection is a very powerful mathematical tool often used by scientists and engineers. As a scientist, you might be interested in finding common points between two sets of chemical or physical sample data. For example, a pharmaceutical research chemist might have two compounds that seem to provide a certain beneficial effect. Finding the commonality (the intersection) between the two compounds might help discover what it is that makes the two compounds effective. Or, an engineer might be interested in finding the intersection between one alloy that is hard but brittle and another alloy that is soft but resilient.
Let’s take a look at intersection in action by examining two sets of numbers. In this example, each single number is a member of the set. The first set of numbers is as follows:
- 1, 5, 8, 9, 32, 55, 78
The second set of numbers is as follows:
- 3, 7, 8, 22, 55, 71, 99
The intersection of these two sets of numbers is the numbers common to both sets:
- 8, 55
The individual entries—the members—of each set don’t have to be just single values. In fact, when solving problems with SQL, you’ll probably deal with sets of rows.
According to set theory, when a member of a set is something more than a single number or value, each member (or object) of the set has multiple attributes or bits of data that describe the properties of each member. For example, your favorite stew recipe is a complex member of the set of all recipes that contains many different ingredients. Each ingredient is an attribute of your complex stew member.
To find the intersection between two sets of complex set members, you have to find the members that match on all the attributes. Also, all the members in each set you’re trying to compare must have the same number and type of attributes. For example, suppose you have a complex set like the one below, in which each row represents a member of the set (a stew recipe), and each column denotes a particular attribute (an ingredient).
Potatoes |
Water |
Lamb |
Peas |
Rice |
Chicken Stock |
Chicken |
Carrots |
Pasta |
Water |
Tofu |
Snap Peas |
Potatoes |
Beef Stock |
Beef |
Cabbage |
Pasta |
Water |
Pork |
Onions |
A second set might look like the following:
Potatoes |
Water |
Lamb |
Onions |
Rice |
Chicken Stock |
Turkey |
Carrots |
Pasta |
Vegetable Stock |
Tofu |
Snap Peas |
Potatoes |
Beef Stock |
Beef |
Cabbage |
Beans |
Water |
Pork |
Onions |
The intersection of these two sets is the one member whose attributes all match in both sets:
Potatoes |
Beef Stock |
Beef |
Cabbage |
Intersection between Result Sets
If the previous examples look like rows in a table or a result set to you, you’re on the right track! When you’re dealing with rows in a set of data that you fetch with SQL, the attributes are the individual columns. For example, suppose you have a set of rows returned by a query like the following one. (These are recipes from John’s cookbook.)
Recipe |
Starch |
Stock |
Meat |
Vegetable |
Lamb Stew |
Potatoes |
Water |
Lamb |
Peas |
Chicken Stew |
Rice |
Chicken Stock |
Chicken |
Carrots |
Veggie Stew |
Pasta |
Water |
Tofu |
Snap Peas |
Irish Stew |
Potatoes |
Beef Stock |
Beef |
Cabbage |
Pork Stew |
Pasta |
Water |
Pork |
Onions |
A second query result set might look like the following. (These are recipes from Mike’s cookbook.)
Recipe |
Starch |
Stock |
Meat |
Vegetable |
Lamb Stew |
Potatoes |
Water |
Lamb |
Peas |
Turkey Stew |
Rice |
Chicken Stock |
Turkey |
Carrots |
Veggie Stew |
Pasta |
Vegetable Stock |
Tofu |
Snap Peas |
Irish Stew |
Potatoes |
Beef Stock |
Beef |
Cabbage |
Pork Stew |
Beans |
Water |
Pork |
Onions |
The intersection of these two sets is the two members whose attributes all match in both sets—that is, the two recipes that Mike and John have in common.
Recipe |
Starch |
Stock |
Meat |
Vegetable |
Lamb Stew |
Potatoes |
Water |
Lamb |
Peas |
Irish Stew |
Potatoes |
Beef Stock |
Beef |
Cabbage |
Sometimes it’s easier to see how intersection works using a set diagram. A set diagram is an elegant yet simple way to diagram sets of information and graphically represent how the sets intersect or overlap. You might also have heard this sort of diagram called a Euler or Venn diagram. (By the way, Leonard Euler was an eighteenth-century Swiss mathematician, and John Venn used this particular type of logic diagram in 1880 in a paper he wrote while a Fellow at Cambridge University. So you can see that “thinking in sets” is not a particularly modern concept!)
Let’s assume you have a nice database containing all your favorite recipes. You really like the way onions enhance the flavor of beef, so you’re interested in finding all recipes that contain both beef and onions. Figure 7-1 shows the set diagram that helps you visualize how to solve this problem.
Figure 7-1 Finding out which recipes have both beef and onions
The upper circle represents the set of recipes that contain beef. The lower circle represents the set of recipes that contain onions. Where the two circles overlap is where you’ll find the recipes that contain both—the intersection of the two sets. As you can imagine, you first ask SQL to fetch all the recipes that have beef. In the second query, you ask SQL to fetch all the recipes that have onions. As you’ll see later, you can use a special SQL keyword—INTERSECT—to link the two queries to get the final answer.
Yes, we know what you’re thinking. If your recipe table looks like the samples above, you could simply say the following:
- “Show me the recipes that have beef as the meat ingredient and onions as the vegetable ingredient.”
Translation |
Select the recipe name from the recipes table where meat ingredient is beef and vegetable ingredient is onions |
Clean Up |
Select |
SQL |
SELECT RecipeName FROM Recipes WHERE MeatIngredient = 'Beef' AND VegetableIngredient = 'Onions' |
Hold on now! If you remember the lessons you learned in Chapter 2, you know that a single Recipes table probably won’t cut it. (Pun intended!) What about recipes that have ingredients other than meat and vegetables? What about the fact that some recipes have many ingredients and others have only a few? A correctly designed recipes database will have a separate Recipe_Ingredients table with one row per recipe per ingredient. Each ingredient row will have only one ingredient, so no single row can be both beef and onions at the same time. You’ll need to first find all the beef rows, then find all the onions rows, and then intersect them on RecipeID. (If you’re confused about why we’re criticizing the previous table design, be sure to go back and read Chapter 2!)
How about a more complex problem? Let’s say you want to add carrots to the mix. A set diagram to visualize the solution might look like Figure 7-2.
Figure 7-2 Determining which recipes have beef, onions, and carrots
Got the hang of it? The bottom line is that when you’re faced with solving a problem involving complex criteria, a set diagram can be an invaluable way to see the solution expressed as the intersection of SQL result sets.
Problems You Can Solve with an Intersection
As you might guess, you can use an intersection to find the matches between two or more sets of information. Here’s just a small sample of the problems you can solve using an intersection technique with data from the sample databases:
- “Show me customers and employees who have the same name.”
- “Find all the customers who ordered a bicycle and also ordered a helmet.”
- “List the entertainers who played engagements for customers Bonnicksen and Rosales.”
- “Show me the students who have an average score of 85 or better in Art and who also have an average score of 85 or better in Computer Science.”
- “Find the bowlers who had a raw score of 155 or better at both Thunderbird Lanes and Bolero Lanes.”
- “Show me the recipes that have beef and garlic.”
One of the limitations of using a pure intersection is that the values must match in all the columns in each result set. This works well if you’re intersecting two or more sets from the same table—for example, customers who ordered bicycles and customers who ordered helmets. It also works well when you’re intersecting sets from tables that have similar columns—for example, customer names and employee names. In many cases, however, you’ll want to find solutions that require a match on only a few column values from each set. For this type of problem, SQL provides an operation called a JOIN—an intersection on key values. Here’s a sample of problems you can solve with a JOIN:
- “Show me customers and employees who live in the same city.” (JOIN on the city name.)
- “List customers and the entertainers they booked.” (JOIN on the engagement number.)
- “Find the agents and entertainers who live in the same ZIP Code.” (JOIN on the ZIP Code.)
- “Show me the students and their teachers who have the same first name.” (JOIN on the first name.)
- “Find the bowlers who are on the same team.” (JOIN on the team ID.)
- “Display all the ingredients for recipes that contain carrots.” (JOIN on the ingredient ID.)
Never fear. In the next chapter we’ll show you all about solving these problems (and more) by using JOINs. And because so few commercial implementations of SQL support INTERSECT, we’ll show how to use a JOIN to solve many problems that might otherwise require an INTERSECT.