Home > Articles > Data > MySQL

  • Print
  • + Share This
This chapter is from the book

Understanding Normalization

Normalization is simply a set of rules that will ultimately make your life easier when you're wearing your database administrator hat. It's the art of organizing your database in such a way that your tables are related where appropriate and flexible for future growth.

The sets of rules used in normalization are called normal forms. If your database design follows the first set of rules, it's considered in the first normal form. If the first three sets of rules of normalization are followed, your database is said to be in the third normal form.

Throughout this hour, you'll learn about each rule in the first, second, and third normal forms and hopefully will follow them as you create your own applications. You'll be using an example set of tables for a students and courses database and taking it to the third normal form.

Problems with the Flat Table

Before launching into the first normal form, you have to start with something that needs to be fixed. In the case of a database, it's the flat table. A flat table is like a spreadsheet—many, many columns. There are no relationships between multiple tables; all the data you could possibly want is right there in that flat table. This scenario is inefficient and consumes more physical space on your hard drive than a normalized database.

In your students and courses database, assume you have the following fields in your flat table:

  • StudentName—The name of the student.

  • CourseID1—The ID of the first course taken by the student.

  • CourseDescription1—The description of the first course taken by the student.

  • CourseIntructor1—The instructor of the first course taken by the student.

  • CourseID2—The ID of the second course taken by the student.

  • CourseDescription2—The description of the second course taken by the student.

  • CourseIntructor2—The instructor of the second course taken by the student.

  • Repeat CourseID, CourseDescription, and CourseInstructor columns many more times to account for all the classes a student can take during their academic career.

With what you've learned so far, you should be able to identify the first problem area: CourseID, CourseDescription, and CourseInstructor columns are repeated groups.

Eliminating redundancy is the first step in normalization, so next you'll take this flat table to first normal form. If your table remained in its flat format, you could have a lot of unclaimed space, and a lot of space being used unnecessarily—not an efficient table design!

First Normal Form

The rules for the first normal form include

  • Eliminate repeating information.

  • Create separate tables for related data.

If you think about the flat table design, with many repeated sets of fields for the student and courses database, you can identify two distinct topics: students and courses. Taking your student and courses database to the first normal form would mean that you create two tables: one for students and one for courses, shown in Figure 3.9.

Figure 3.9 Breaking the flat table into two tables.

Your two tables now represent a one-to-many relationship of one student to many courses. Students can take as many courses as they wish and are not limited to the number of CourseID/CourseDescription/CourseInstructor groupings that existed in the flat table.

The next step is to put the tables into second normal form.

Second Normal Form

The rule for the second normal form is

  • No non-key attributes depend on a portion of the primary key.

In plain English, this means that if fields in your table are not entirely related to a primary key, you have more work to do. In the students and courses example, it means breaking out the courses into their own table, and modifying the students_courses table.

CourseID, CourseDesc, and CourseInstructor can become a table called courses with a primary key of CourseID. The students_courses table should then just contain two fields: StudentID and CourseID. You can see this new design in Figure 3.10.

Figure 3.10 Taking your tables to second normal form.

This structure should look familiar to you as a many-to-many relationship using an intermediary mapping table. The third normal form is the last form we'll look at, and you'll find it's just as simple to understand as the first two.

Third Normal Form

The rule for the third normal form is

  • No attributes depend on other non-key attributes.

This rule simply means that you need to look at your tables and see if more fields exist that can be broken down further and that aren't dependent on a key. Think about removing repeated data and you'll find your answer—instructors. Inevitably, an instructor will teach more than one class. However, CourseInstructor is not a key of any sort. So, if you break out this information and create a separate table purely for the sake of efficiency and maintenance (as shown in Figure 3.11), that's the third normal form.

Figure 3.11 Taking your tables to third normal form.

Third normal form is usually adequate for removing redundancy and allowing for flexibility and growth. The next section will give you some pointers for the thought process of database design and where it fits in the overall design process of your application.

  • + Share This
  • 🔖 Save To Your Account