Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Database Design: Changing Attributes to Columns

Last updated Mar 28, 2003.

In the last tutorial we completed our logical Entity Relationship Diagram (ERD) and began the process for creating a physical model from that logical diagram. We converted the entities to tables, and discussed the process of normalization as well as defining what a relationship is.

We also started the process of changing attributes to columns. The first thing we did was to take the names of the attributes and ensure that the concepts they represented were "atomic" enough — meaning that if an attribute expresses more than a single object, we must make more columns for those attributes. For instance, if we think of a "Person" object and include where they live, then we are expressing two thoughts for one object. We are connecting where the person lives to the person themselves – not a good idea. For one thing, the "Street Name" part of the address doesn't belong to the person, it belongs to the address. For another, more people might live at the same address, so it doesn't belong to the person at all — it stands alone and needs its own entity to describe it.

In our case, the attributes we've described are singular in nature and all we needed to do was change the names a bit. While SQL Server allows spaces in object names, you have to place brackets "[]" around the object name when we’re typing code, so we took them out.

Here is the diagram we have so far of our objects:

In this tutorial, we’ll take each table and convert the individual attributes to true columns, complete with keys and other constraints, data types, and more.

When we open the management tools (SQL Server 2000 here, SQL Server 2005 here) to create the columns, we’ll need a few pieces of information ready, such as whether or not the column is a key, what data type it is, whether it can contain a NULL value, and its default value.

SQL Server isn’t case-sensitive by default, but it’s always good practice to pick a methodology for naming conventions and stick with it throughout the enterprise. SQL Server can honor case, so be careful here!

As you can see, there are quite a few decisions to work out before you do the conversion. Let’s take our ERD and work through it a little at a time.

Starting with the Staff_Members table, let’s examine the attributes that it contains and convert them to columns.

We’ll need a primary key for this table, since it will be referenced by other tables. We’ll call this column Staff_Code.

I’m a big proponent of using a surrogate key. A surrogate key is a code or number which is independent of the meaning of any of the other attributes in the table. In contrast, a natural key is a column that is both unique and consistently available throughout the data domain, and is guaranteed to have a value at all times. One example of a natural key might be the Social Security number in the United States. It is supposed to be unique to one person, and in fact it is — when the person has one. First and last names certainly aren't always unique, and some people have more than one last name. Other numbering schemes and identifiers have similar problems. Since the natural key is hard to find, we’ll make up a standard way of numbering the record that identifies something or someone and stick with it.

If we’re going to use a code with alphanumeric values, we’ll have to use the varchar, char, or nvarchar data types. (If you need a refresher on data types, check my tutorial here.)

The other issue with using an alphanumeric value for the primary key is that we’ll need to create a value-generator in our code so that when we do inserts we create proper key values. Unless that code is well thought-out, we can quickly get into trouble here. Another argument against alphanumeric keys is that they don’t sort as well as numeric types.

A final issue with alphanumeric codes is that it becomes tempting to create a "database within a field" with these values. To imagine what I’m describing, think of a Work Breakdown Structure (WBS) code, used by many project management systems. In project databases, certain codes are used where the position and character has meaning. For instance, in the number 12345, the position of number 1 designates a certain company, the 2 means the department, the 3 is the project, the 4 is the employee’s code, and the 5 is the work type being performed. What we’ve got here is meaning within the characters of a field — a mini-database.

While this WBS code is usually unique, it doesn’t always work that way. Things change when the company is purchased, the client merges with another, the project stops and then re-starts, and more. I’m not saying you should never do this, but give it some thought before you do. We certainly don’t want our primary key to have this characteristic, because it isn’t guaranteed to stay constant. The only meaning that stays constant is something with no meaning to change to begin with, so the surrogate key is usually the way to go.

So, in our case we’ll just use a numeric data type. But which one? Recall from the Database Object tutorials that there are several number data types, each with its own use. My advice is to pick the smallest number type you can, but always err on the side of caution. That means that we’ll need to pick a small number type, but one large enough to encompass all possible values we can think we might need in this case.

The two candidates we might use are int and smallint. The int type covers numbers into the millions, and smallint values go to somewhere around thirty-two thousand. Since we don’t anticipate the number of staff members ever reaching into even the tens of thousands (even including using new numbers for new employees), smallint is the winner. The good part about this choice is that if we need the higher values later, we can always alter the column to have an int data type, and the values currently in that column of the table will convert properly.

Next we need to decide how the key will be generated. If the key uses an identity type qualifier, SQL Server will automatically choose the next number for us. While that is easy for the developer, there are times when it’s not a good choice (we’ll cover them in later tutorials). If we don’t use an identity value, we’ll have to generate the key in code, which causes the developer more work, and has the potential for error.

For this example, we’ll use the identity qualifier, since it will save us some coding. Even if we change our mind later, we can always turn off the identity value and create a numbering mechanism ourselves.

We also have to choose a "seed" for the identity, which is the value SQL Server uses to create the next value in an identity field. We’ll leave ours at 1, meaning that the numbering will start with 1 and move upward.

Normally we’d need to decide whether we would allow NULL values in the field, but for a primary key, we don’t have to make that decision. Primary keys must always have a value.

That may seem like a lot of work for just one field. We took our time with this one, but the rest of the primary keys in all our tables will work the same way.

Moving on to the rest of the field types, we have more choices. Since we’re describing the employees, we need to record their names. We’ll use a field called Name to hold those values. We might have broken out first and last names, but since the business requirements don’t really need the first and last names broken out, this choice is acceptable — but will the system ever change? It's something we need to consider. We'll leave it this way for this example.

The final issue we'll consider with using a single field for two parts of the name is indexing. If we provide a search function, the users will certainly want to search by last name. Depending on how many employees we have, we may in fact want to create the first and last field names, and then concatenate them in the program so that they look as if they are one field. After much discussion, we decide to keep the single field. (This choice will haunt us later.)

Any of the character types will work for this field, but we’ll choose nvarchar. We’ll use this type because the nvarchar uses two bits for each character; that allows us to use other language characters in the names, in case we have employees from other countries.

We’ll make the length large enough to cover just about any name we can think of. Fifty (50) characters should do.

Next we need to decide the "nullability" of this field. Should we be able to create an employee record with no name? We decide that we can’t think of a reason that this would occur, so we make this field required. No NULLs here.

An important decision when we force a field not to accept NULL values is that we should provide a default value for it. When insert operations take place from the program and we don’t provide a default value, the developer has to work harder to make sure the user enters a value.

If we are not sure about which value to enter as a default, we can use "UNKNOWN" or "NOT ENTERED." If we’re going to enter such an ambiguous value, why not just allow NULL values? The reason lies with the concept of NULL, which I’ve covered in other tutorials. Suffice it to say here that we should never compare a value against the value NULL. We can, however, compare things against the text "UNKNOWN" all day long.

After having said all that, we won’t provide a default for this field. We want the user to enter a name here, so the developer will have to provide a mechanism to ensure that they do.

In the diagram, we have a field called Years_on_staff. An important axiom to remember is that in an Online Transaction Processing (OTLP) database, we normally don’t want to store a computed value. Instead we should store the base data, and let the reporting systems decide how to show the data. We’ll change this field name to Employeement_date, make it a datetime type (of course) and make it a required field. The length is automatically determined by the datetype type. We’ll default it to today, using the GETDATE() function.

Moving on to the Skills table, we’ll repeat the process for the fields we find here. Remember that the Skills table is used to store the various skills that an employee has.

We need a primary key for this table, so we’ll use the Skill_Code column with the same setup as we did earlier.

This table is a child to another, so we’ll reference the Staff_code field in the Staff_Members table with the same one here. Recall that this process is called a foreign key, and we certainly don’t want this field to be unique. It does, however, need to be the same type and length as the primary key it references.

We’ll need a Name field to store the name of the skill. We’ll use a varchar type, and 30 characters should suffice. NULL’s are OK in this field. We’ll use the same logic for the Classification and Level fields. As you can see, the process goes quicker once you get started.

The Clients table is a parent table, and holds the information for all the clients to which projects belong. Again, we’ll make a Client_Code field that acts as the primary key, just as we have for the other tables.

We’ll follow a similar process for the Name and Primary_Address field that we have for the other varchar types. The Primary_Phone field deserves a bit more attention, though.

There are a few schools of thought regarding phone number fields. If we are going to have more than one phone number for the company (normally a good practice), then we are looking at many records to identify one company, or many phone fields in a record. Whenever that kind of situation arises, we need to add another table. To keep this tutorial manageable, we’ve opted not to do that, but you should be aware of it since that situation is a bit more realistic.

Another factor with phone numbers is the data type. Phone numbers are just numbers, right? Well, not always. Some phone numbers are used with special codes or characters. This requires that the field is a character type. Because we might also work with overseas customers and have to dial special codes to reach them, we’ll set this field to a varchar type with 30 characters. We’ll also allow it to contain NULLs.

The last field in this table is the Start_Date, with a data type of datetime. We’ll require a value here, and set a default of the SQL Server function GETDATE(). That function will grab the current date from the system and store it in the field automatically if the user doesn’t enter one, which is a good default to have for this kind of data. It saves the developer some work.

Normally during the design we wouldn’t spend so much time on the field definitions. The process usually involves the use of a table-like structure, which we’ll use now for the Projects and Hours tables. Let’s take a look at how that would lay out:

Projects

Field Name

Use

Type

Length/ Precision

Null

Default

Project_Code

Primary key

Smallint (identity)

N/A

No

N/A

Client_code

Foreign key to Clients table

Smallint

N/A

No

N/A

Name

Name of project

Varchar

50

No

"UNKNOWN"

Phase

Current phase of project

Varchar

30

No

"Initial"

Budget

Budgeted hours for project

Smallint

N/A

No

0

State

Active or inactive

Varchar

30

No

"UNKNOWN

Hours

Field Name

Use

Type

Length/ Precision

Null

Default

Hours_Code

Primary key

Smallint (identity)

N/A

No

N/A

Project_Code

Foreign key to Projects table

Smallint

N/A

No

N/A

Staff_Code

Foreign key to Staff_Member table

Smallint

N/A

No

N/A

Role

Role filled by staff member

Varchar

100

Yes

N/A

Start _Time

Start date and time

Datetime

N/A

No

GETDATE()

End_Time

End date and time

Datetime

N/A

Yes

N/A

Rate

Rate charged

Smallmoney

N/A

No

0.00

Description

Activity performed

Varchar

255

No

"DESCRIPTION"

You can also represent the data with a graphic representation like this one:

What we’ve learned so far is the process for creating the database design manually. In the next tutorial, we’ll implement this design using the SQL Server management tools.

We’ve been using manual methods to design our database. There are many software products that will help you automate the definition of the entities, designing the relationships, creating an ERD, and even creating physical database with graphical tools. These products include everything from Microsoft Visio all the way to full design tools such as ERWin and the Embarcadero suite of tools. Even if you invest in one or more of these useful tools, I still recommend you make your first few small database designs by hand using the methods we’ve discussed here.

In the next tutorial we’ll create the Data Definition Language (DDL) to make our database.

InformIT Articles and Sample Chapters

I’m not sure who the author of this site is, but the author has a good design exercise similar to ours, and this section discusses the process used to change their attributes to columns. Note — you’ll need a postscript reader to read the file; Visio can work for that if you have it.

Online Resources

David Besch has a good excerpt from his book called MCSE Training Guide: SQL Server 7 Database Design on implementing the physical design.