As designers, we all learn early how to implement relationships using a relational database management system (RDBMS). We establish the connection by carrying the same data value in both of the tables. To put the two rows together, we simply join where the data values match up. We then cement that relationship with primary keys and foreign keys. What column should we use to do this? A big temptation is to use data that occurs "naturally"data values that everyone knows and understandsbut maybe that's not such a good idea. Using naturally occurring data can have some negative impacts on performance. I'll describe those problems in this article.
What Is "Naturally Occurring Data"?
A thing's name should tell us something about that thingwhy we call it what we do. Consider Athens Road: Isn't it obvious what we're doing when we start down that road? If we considered a set of data about roads, we would say that the name of the road is a value that naturally occurs within the attributes.
In the Middle Ages, if your neighbor was John Carpenter, you had a pretty good idea of how he spent his day (and why his kid was called "Splinter"). In a set of population data scribed into the Doomsday Book, his name would be a tempting primary key (PK).