Database Design Crash Course
In this chapter we will review the basic principles of database design and normalization. A well-designed database minimizes redundancy without losing any data. That is, we aim to use the least amount of storage space for our database while still maintaining all links between data.
We will cover the following:
Database concepts and terminology
Database design principles
Normalization and the normal forms
Database design exercises
Database Concepts and Terminology
To understand the principles we will look at in this chapter, we need to establish some basic concepts and terminology.
Entities and Relationships
The very basics of what we are trying to model are entities and relationships. Entities are the things in the real world that we will store information about in the database. For example, we might choose to store information about employees and the departments they work for. In this case, an employee would be one entity and a department would be another. Relationships are the links between these entities. For example, an employee works for a department. Works-for is the relationship between the employee and department entities.
Relationships come in different degrees. They can be one-to-one, one-to-many (or many-to-one depending on the direction you are looking at it from), or many-to-many. A one-to-one relationship connects exactly two entities. If employees in this organization had a cubicle each, this would be a one-to-one relationship. The works-for relationship is usually a many-to-one relationship in this example. That is, many employees work for a single department, but each employee works for only one department. These two relationships are shown in Figure 3.1.Figure 3.1 The is-located-in relationship is one-to-one. The works-for relationship is many-to-one.
Note that the entities, the relationships, and the degree of the relationships depend on your environment and the business rules you are trying to model. For example, in some companies, employees may work for more than one department. In that case, the works-for relationship would be many-to-many. If anybody shares a cubicle or anybody has an office instead, the is-located-in relationship is not one-to-one.
When you are coming up with a database design, you must take these rules into account for the system you are modeling. No two systems will be exactly the same.
Relations or Tables
MySQL is a relational database management system (RDBMS)that is, it supports databases that consist of a set of relations. A relation in this sense is not your auntie, but a table of data. Note that the terms table and relation mean the same thing. In this book, we will use the more common term table. If you have ever used a spreadsheet, each sheet is typically a table of data. A sample table is shown in Figure 3.2.Figure 3.2 The employee table stores employee IDs, names, jobs, and the department each employee works for.
As you can see, this particular table holds data about employees at a particular company. (We have not shown data for all the employees, just some examples.)
Columns or Attributes
In database tables, each column or attribute describes some piece of data that each record in the table has. The terms column and attribute are used fairly interchangeably, but a column is really part of a table, whereas an attribute relates to the real-world entity that the table is modeling. In Figure 3.2 you can see that each employee has an employeeID, a name, a job, and a departmentID. These are the columns of the employee table, sometimes also called the attributes of the employee table.
Rows, Records, Tuples
Look again at the employee table. Each row in the table represents a single employee record. You may hear these called rows, records, or tuples. Each row in the table consists of a value for each column in the table.
A superkey is a column (or set of columns) that can be used to identify a row in a table. A key is a minimal superkey. For example, look at the employee table. We could use the employeeID and the name together to identify any row in the table. We could also use the set of all the columns (employeeID, name, job, departmentID). These are both superkeys.
However, we don't need all those columns to identify a row. We need only (for example) the employeeID. This is a minimal superkeythat is, a minimized set of columns that can be used to identify a single row. So, employeeID is a key.
Look at the employee table again. We could identify an employee by name or by employeeID. These are both keys. We call these candidate keys because they are candidates from which we will choose the primary key. The primary key is the column or set of columns that we will use to identify a single row from within a table. In this case we will make employeeID the primary key. This will make a better key than name because it is common to have two people with the same name.
Foreign keys represent the links between tables. For example, if you look back at Figure 3.2, you can see that the departmentID column holds a department number. This is a foreign key: The full set of information about each department will be held in a separate table, with the departmentID as the primary key in that table.
The term functional dependency comes up less often than the ones previously mentioned, but we will need to understand it to understand the normalization process that we will discuss in a minute.
If there is a functional dependency between column A and column B in a given table, which may be written A > B, then the value of column A determines the value of column B. For example, in the employee table, the employeeID functionally determines the name (and all the other attributes in this particular example).
The term schema or database schema simply means the structure or design of the databasethat is, the form of the database without any data in it. If you like, the schema is a blueprint for the data in the database.
We can describe the schema for a single table in the following way:
employee(employeeID, name, job, departmentID)
In this book, we will follow the convention of using a solid underline for the attributes that represent the primary key and a broken underline for any attributes that represent foreign keys. Primary keys that are also foreign keys will have both a solid and a broken underline.