Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Database Design: Normalizing the Model

Last updated Mar 28, 2003.

In this continuing series on database design I’ve shown you how to gather business requirements, pull entities, attributes and relationships from those requirements, and create a logical Entity Relationship Diagram from that information. I’ve used a very simple example, something you would never see in production, so that you can follow along quickly. Even though those examples are simple, the process is the same for a longer set of requirements. 

In this week's tutorial, I’ll explain the design that I've created for this sample. It's been a lengthy process, but in real-world applications you'll find the process goes quicker with every database you design. You begin to recognize patterns that occur over and over. I find it interesting to hear clients say "Oh, we have some really unique requirements," only to end up with a design remarkably similar to many I've seen before.

Here is the design as it stands today:

It's important to mention again that this is a very simple design, made from requirements that are pretty simple as well. I had to strike a balance of staying within the confines of a readable tutorial and yet model something that approximates a real-world exercise. While you'll probably be asked to create far more complex designs than this one, the concepts you're learning will stand for any size endeavor.

A database model is not the end of the design process — it's the means. Tools like the logical Entity Relationship Diagram shown above point out the strengths and weaknesses of our design logic. And to that end, I mentioned in the last tutorial has a couple of flaws, even in this model.

For one thing, the model tracks hours (as required by the business requirements) but what does the word "track" really mean? You may have noticed that as you go forward with a design, seemingly definitive words begin to become ambiguous, and this is one of those times. Strictly speaking, I've covered the requirements by including a referential entity called Hours that stores the number of hours on a given project for a given consultant. The number of hours, however, is only one aspect of tracking. There are only a few times that the business user is going to be interested only in the count of hours; often they ask for the range of hours. Common sense and experience should make you question the simplicity of the design.

Once again, you should go back to the design committee for clarification. Just as in real life, much discussion ensues, and the requirement is modified to state that what needs to be tracked is the start and stop time for each consultant's activity on a particular project. With that (albeit imaginary) clarity, I'll make a modification to the model.

While I've got my imaginary design committee together, I bring up another other suspected flaw in the model. There's nowhere to store the description of the consultant's activity. Since that fact wasn't presented in the business requirements, I don't have an actual deficiency in the model, but it doesn't seem to make sense that a consultant bills hours out and yet doesn't record what he or she did. Should I include a field to store that?

This brings up an important point. All projects have a similar danger: scope creep. Scope creep occurs when the requirements of what the process is designed to do changes. It adds time and resources to a closed requirement — making the plan less useful. If the design committee had all the facts at the beginning of the process, they may have designed the initial model differently. Instead, new or different concepts are often "tacked on" at the end, skewing the model.

In this case, however, the decision involves a clarification of the hours entity, not a change of its definition. Since that's the case, the committee agrees that I should add a field to store activity descriptions.

Here's the new model, complete with three new attributes. You can see I've placed them in the Hours entity, since these attributes only materialize during the intersection of a project and a staff member.

At this point I'll make an executive decision and call the design complete so that I can move on with this tutorial.

Now that I have a completed model, I need to have the design committee give it a look over. It's important to make sure that I have the design right at this stage; I’m about to create the physical model, and I can't afford many missteps here. To be clear — I can still make changes, even up to the deployment of the database into production. In fact, it’s common to make changes even after the system is in production, but you should be aware that it becomes significantly harder with each step towards production to do so, and can cause a lot of issues when you do. Spend more time on the design and you’ll spend less on corrections later.

Once I've gotten the "All Clear" from the design committee, I can start the process of converting the logical design into the physical database.

Introducing Normalization

Before I begin the creation of tables, fields, primary keys, views and the like, I need to cover the concept of database normalization for you. Some database design training places this part of the discussion after the initial conversion of entities and attributes, but I feel the rules of normalization are more applicable to the creation, not refinement, of the base tables and fields. You can normalize the database at either side of the process, but my preference is to normalize during the first past of the physical Entity Relationship Diagram.

So what exactly is this database normalization? It's a set of concepts (formally documented by E.F. Codd) that, when followed, creates a stable, easy to understand, reliable database. He formulated several stages of something he called normalization.

There are three levels commonly parsed through to completely "normalize" a database, although more are available. We'll learn about the first three, as those are the ones you'll use most often. Three levels are almost always sufficient to bring a database to a consistent, optimized state.

The remainder of this article will deal with those three levels. In the next tutorial, we'll use these rules to convert our logical model to the physical platform requirements for SQL Server.

You can find several references to the formal rules for database normalization, but I'm going to set them into terms that are a bit easier to understand.

The rule for setting a database to first normal form is:

    Eliminate repeating groups of entities into individual tables

You probably already know this rule intuitively. I moved to first normal form by moving data out of text files or spreadsheets and creating tables. I've already set each entity into only one set of related data, and we'll complete first normal form by creating a primary key.

The rule for setting a database to second normal form is that you've set the database to first normal form plus:

    Create separate tables for sets of values that apply to multiple records

This rule deals with redundant data. I've already covered this rule in our design, by creating separate entities for the multiple skills that a staff member might have. I'll refine the design to include this rule by relating tables, allowing for single concepts within a table. I'll do this with a foreign key: the value in one table that points to the primary key in another. By applying the rules one at a time you can see the design begin to "tighten" and make more sense.

The rule for setting a database to third normal form is that you've set the database to second normal form plus:

    Eliminate fields that do not depend on the primary key

This rule is easier to explain than to implement. Essentially it means that if you have an attribute in an entity that doesn't depend on the attribute you've picked for a primary key, you have to move it out to another entity. I'll explain this form more fully in the next few tutorials.

In the next part of this series, I'll use these concepts to create a physical layout for the database. The database diagram will become more complex, but keep in mind that it will be used for a different purpose. This new diagram will be used by database administrators, not the general business populace, so it has more technical information.