The Math Behind the Theory
The rules for a mathematical set of data tell us that each tuple (or row, as we non-mathematicians would rather say) must be different. Set theory then states that each tuple must have a unique identifier. The RDBMS implements this set logic and allows us to specify a row differentiator called the primary key.
Relationships are implemented when the primary key from one side is placed into the child table as a foreign key (FK). The term foreign here simply means that the data value doesn't normalize to the child; it belongs in the parent. This setup implies that a parent is unique, and that many children could spring from this parent. (If cloning becomes socially acceptable, we may have to alter our RDBMS terminology!)
So the key, literally, is to know what makes a parent row truly unique. If you're a child under the age of 13, you probably look at your parents with adoring eyes and know how lucky you are: Your parent is definitely one of a kind! However, if you're a teenager, then you know that all parents are the same; they're mass-produced and placed on this world to drain all the fun out of being a teenager.
Let's consider the case of big teenagers: professional baseball players. They have a many-to-one relationship with their team (assuming that we're not attempting to track this relationship over time, in which case no RDBMS product, even running on a Cray, could possibly track all the changes). And, since we want to make their statistics available to fantasy league players, we'll also need to record player history (see Figure 1).
A brief word on syntax: There are many notations depicting relationships. We'll stick to the classic line (meaning 1) with arrowhead (meaning many), and use solid lines (meaning mandatory).
Figure 1 Entity relationship diagram (ERD).
What uniquely identifies a baseball team? We could try to use the city, but New York City has two teams (although Yankee fans deny that the Mets are really a professional team). How about the team name? Again, this would get us into trouble over time: The Braves have resided in Boston; Milwaukee; and their current digs, Atlanta. And the Tennessee Titans of the NFL seem to have forgotten about the AFL New York Titans, precursors to the NY Jets, who actually play in New Jersey!
So we could conclude that we would have to use the concatenated form of city and team name. This begins to become unwieldy and has consequences. The string would have to be saved in every row, both parent and all children. (Let's hope that a given parent is not too prolific.) How long could this primary key be? If major league baseball wanted to lower some of its costs and locate a team in an area where many players come from, then we'd have a team in Mexico City. And if they wanted to be sensitive to the locals, the real name of the city would be La Cuidad de Mexico, Distrito Federal. And, as the team name, they could select Fundacion de la Junta para Ampliacion de Guerrilla. (I pity the poor SQL writer retrieving this row by supplying the string literal!) Or, imagine a team locating in Tokyo, but we needed to use the Japanese character set. Not all RDBMSs support storing multiple national language sets.
As we're pondering what makes a naturally occurring piece of data so attractive, the issue of what uniquely identifies a Player also comes to mind. (We also have to assume that all steroids are natural.) In the early days of the game, owners of baseball teams identified their players with a number on their uniforms. It's possible that these owners foresaw the arrival of Dr. Ted Codd and the emergence of relational theory. (Obviously, they wanted to make sure that they could have more than one player with the same namethe Giants once had three Alous in the outfield.) But it's far more likely that they saw stitching names on the backs of uniforms as an expense: George Steinbrenner, whose Yankees still don't have player names on their uniforms, puts this money savings into players' salaries instead. So the obvious choice to uniquely identify a player is to use the player's number as the primary key to the Player table. Our SQL writer retrieving Player #14 now breathes a sigh of relief.