Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Creating The Physical Model

Last updated Mar 28, 2003.

In the last few articles in this series, I’ve explained the process to develop a database design in an intelligent, systematic way. I've covered the process for gathering business requirements, the conversion of those requirements into logical statements, the creation of entities, attributes and relationships, and the creation of the logical Entity Relationship Diagram (ERD) — a graphical tool showing this logical design.

The example database I’m designing tracks the hours consultants spend on projects. I've been careful to design the entities and attributes in accordance with the requirements document — in fact, making sure the requirements reflect the data, rules and processes for the system forms the basis of everything else I’ll show you in this process. It’s the single-most important step.

After the initial design, I explained a problem with the relationship aspect of the diagram. I had to go back to the design committee for clarification, and once I did, I altered the design to properly reflect the relationship. While I was at it, I added a few attributes that were missing.

This design is what is delivered to the business community, along with the business rules, user documentation and so forth, in the form of that graphical Entity Relationship Diagram. That serves as a discussion vehicle, something that has simple symbols and text to rapidly convey what information the database will store, and the rules that will be enforced for optionality (whether something can, should or must exist) and cardinality (how data is related to each other). Once all of the teams agree on the logical layout of the data, the next step is to turn that diagram into a physical model for whatever platform the organization will use. In this series, I’ll use SQL Server, but if you pick another database system such as Oracle or XML files, the diagram will look different. This is the reason for separating the logical ERD from the physical design — it allows the business and developers to create a program that can be implemented on multiple platforms, and the data professionals to focus on the best way to implement the actual platform in the most secure, best performing method.

The business community normally doesn't have much use for the physical ERD I’ll make next. That deliverable is used more by the database analysts, administrators and developers. Not every shop uses a graphical ERD for a physical model. In some cases you’ll go straight from the logical ERD graphic to Transact-SQL statements, and that’s OK. If I’m using an expensive tool for the data design, it often has the capability to simply add an option to show either the logical or physical ERD. I find that the larger the project, the more graphics I use, since I often need to coordinate with more people.

There are several ways to convert a logical design to a physical implementation, but the route I usually use looks something like this:

  1. Apply first normal form rules to the logical ERD
    • Convert the entities into tables
    • Convert the attributes into fields
    • Create primary keys
  2. Apply second normal form rules to the logical ERD
    • Create foreign keys
  3. Enforce business rules with constraints and defaults
  4. Apply third normal form rules to the logical ERD
  5. Add any platform components

While this might look like a short list, it's longer than you might think. Let's try it out, one step at a time.

Apply first normal form rules to the logical ERD

Recall that first normal form specifies separating out any repeating entities into multiple entities.

Convert the entities into tables

The logical model I have largely fits this rule, so the first part of the process is complete.

Convert the attributes into fields

So far, the logical diagram holds — with a couple of small changes. For one, I’ll change any spaces in words to underlines. While SQL Server allows spaces in the names of many objects, I’d have to bracket [ ] them when I reference them in any T-SQL statements. That's inviting an error, and I would like to avoid errors where possible.

I’ve also changed the singular notation on the entities to plural. While this seems a minor change, people can get really concerned about it. My personal rule of thumb is that when I designing logical entities, I'm describing a concept. Therefore, Client is a singular entity, so I keep it singular. Hours and Skills don't stand alone, so those entities I keep plural. When I convert the logical entities to physical tables, I'm no longer describing concepts, but buckets. For that reason, I’ll stick with naming the tables in the plural. There are other schools of thought on this, but in the end the key is to be understandable and consistent.

I’m not done with the fields just yet, though. The other part of this process I’ll have to complete is setting data types, nullability and so on. I’ll complete that in a bit.

Create primary keys

At this point I’ll add my primary keys. In another article on this site I discussed primary keys at length, but for my purposes here I'll simplify the process for creating a primary key into just one rule:

Set a primary key that completely identifies each row in the table in a unique way.

It's a simple rule, and yet its application is a bit more complex. As you may recall from the earlier tutorial regarding keys, there are two possibilities. The first is that a current attribute fits the bill. This is called a "natural" key. In my case, the Staff Member, Client, and Project entities already have unique tracking numbers assigned to them, so they make great primary keys.

Just as often I’ll follow the second practice of setting the primary key to something that never occurs naturally in the database — it’s just a number or series of characters that I make up to represent the actual values on that row, called a “surrogate” key. It’s like the way your employee number at work represents your name and hire date.

An important aspect to the simple rule from above is that the primary key shouldn't change, ever. While you can change a primary key later, it's painful. This brings up an issue with the choice for the primary key. If the natural key might change or may not exist, use a surrogate or artificial key. I’ll have to do that for the Hours and Skills entities, since the values I’m storing there are not unique, and may change over time.

Apply second normal form rules to the logical ERD

Second normal form says that I need to separate out any shared attributes into separate tables. I’ve largely accomplished that goal already, but an additional part of this form is that in separating these attributes into other tables, I need to set up the foreign keys. These are the non-unique values in one table that point to the parent or relating table.

Create foreign keys

In my diagram, I already have pointer attributes (which convert to foreign keys) in the Hours entity. Notice the Project Code and Staff Code attributes in the logical diagram. In the physical model I’ve renamed those attributes to fields called Project_Code and Staff_Code.

In addition, I need to create a few more foreign key fields. I’ve added Staff_Code to the Skills table, and Client_Code to the Projects table.

It's no accident that I’ve named the foreign keys the same as the primary keys in their parent tables. This is usually a good practice, but if it isn't possible I could preface the name with FK_ or some other naming convention. Many tools will “pick up” on the duplicate names and infer that you want them to relate to each other, and some are even intelligent enough to use the Primary/Foreign Key designations to help your users design reports, screens and so on. The easier you make the process on your users, the less they need you to be involved and the fewer errors you’ll face.

Apply third normal form rules to the logical ERD

Third normal form dictates that I have to make sure that all the fields in a table depend on their primary key. It's easier to see this with an example. Let's say that I had a table called Employees with the following fields:

  • Employee_Code
  • Name
  • Address
  • Phone
  • Nationality

At first glance, this table might seem to conform to third normal form. But what about that Nationality field? While it is true that the employee can only have one nationality, it may violate third normal form because the nationality of an employee isn't dependent on their employee code. Nationality may need to be moved out to its own table and a relating table created to match up Nationality to Employee.

I examine my current design, and believe that I am in third normal form.

Enforce business rules with constraints and defaults

Next I need to apply some constraints to the diagram. In the tutorial regarding database objects I explained a few of the constraints you have available and how to apply them, but here's that short list again:

  • Keys, primary and foreign
  • Not Null
  • Check
  • Unique

This process is often the part that takes the longest, and requires the most care. Even in my simple database, this process requires a lot of work, so I'm going to table this part of this part of the tutorial for next week. I wanted to make sure I covered the entire process, but I want to balance that with giving this section its due.

Add any platform components

Here's where I set the fields to the data types, nullability, and other SQL Server-specific constructs. Since this part of the process is also quite lengthy, I'll cover it in the next part of the series with the constraints section.

There's more to do, and I haven't even started my indexes, storage strategy, and maintenance. Even after that, I still have to explain programming considerations such as views, stored procedures and the security. We’ll get to all that in the next few articles in this series.