When we design a database, we need to take two important things into account:
What information needs to be stored? That is, what things or entities do we need to store information about?
What questions will we ask of the database? (These are called queries.)
When thinking about these questions, we must bear in mind the business rules of the business we are trying to modelthat is, what the things are that we need to store data about and what specifically the links are between them.
Along with these questions, we need to structure our database in such a way that it avoids structural problems such as redundancy and data anomalies.
Redundancy Versus Loss of Data
When designing our schema, we want to do so in such a way that we minimize redundancy of data without losing any data. By redundancy, I mean data that is repeated in different rows of a table or in different tables in the database.
Imagine that rather than having an employee table and a department table, we have a single table called employeeDepartment. We can accomplish this by adding a single departmentName column to the employee table so that the schema looks like this:
employeeDepartment(employeeID, name, job, departmentID, departmentName)
For each employee who works in the Department with the number 128, Research and Development, we will repeat the data "128, Research and Development," as shown in Figure 3.3. This will be the same for each department in the company.
Figure 3.3 This schema design leads to redundantly storing the department name over and over.
We can change this design as shown here:
employee(employeeID, name, job, departmentID)
In this case, each department name is stored in the database only once, rather than many times, minimizing storage space and avoiding some problems.
Note that we must leave the departmentID in the employee table; otherwise, we lose information from the schema, and in this case, we would lose the link between an employee and the department the employee works for. In improving the schema, we must always bear these twin goals in mindthat is, reducing repetition of data without losing any information.
Anomalies present a slightly more complex concept. Anomalies are problems that arise in the data due to a flaw in the database design. There are three types of anomalies that may arise, and we will consider how they occur with the flawed schema shown in Figure 3.3.
Insertion anomalies occur when we try to insert data into a flawed table. Imagine that we have a new employee starting at the company. When we insert the employee's details into the employeeDepartment table, we must insert both his department id and his department name. What happens if we insert data that does not match what is already in the table, for example, by entering an employee as working for Department 42, Development? It will not be obvious which of the rows in the database is correct. This is an insertion anomaly.
Deletion anomalies occur when we delete data from a flawed schema. Imagine that all the employees of Department 128 leave on the same day (walking out in disgust, perhaps). When we delete these employee records, we no longer have any record that Department 128 exists or what its name is. This is a deletion anomaly.
Update anomalies occur when we change data in a flawed schema. Imagine that Department 128 decides to change its name to Emerging Technologies. We must change this data for every employee who works for this department. We might easily miss one. If we do miss one (or more), this is an update anomaly.
A final rule for good database design is that we should avoid schema designs that have large numbers of empty attributes. For example, if we want to note that one in every hundred or so of our employees has some special qualification, we would not add a column to the employee table to store this information because for 99 employees, this would be NULL. We would instead add a new table storing only employeeIDs and qualifications for those employees who have those qualifications.