Home > Articles > Data > Access

Access Database Design and Normalization

📄 Contents

  1. The Relational Design Theory
  2. Summary
Here you can learn just enough relational theory to enable you to design Access databases that take advantage of the way relational database operations work. Understanding the concept of data normalization will help you to take advantage of its real-world benefits, giving you flexibility and saving disk space.
This chapter is from the book

This chapter is from the book

In This Chapter:

  • The Relational Design Theory 

Access applications are database applications, an obvious statement that can get lost in the details of designing a dialog box or crafting the perfect form. To develop smarter, faster, more efficient database applications, you need to understand the concept of database normalization, this chapter's topic. Relational database management systems, or RDBMSes, such as Microsoft Access, implement a theory of data management and manipulation called, strangely enough, relational theory, pioneered by Dr. E.F. Codd at IBM during the 1970s. This chapter introduces you to just enough relational theory to enable you to design Access databases that take advantage of the way relational database operations work.

NOTE

Critics of Microsoft Access argue it does not conform to all of Dr. Codd's 13 rules defining a relational database and so cannot be called relational. Predictably, Access' supporters argue it is fully conformant. From a practical standpoint, it behaves like relational database. You may judge for yourself whether it is truly conformant.

Codd's Rules

The following table lists the 13 rules Codd declared to which a database system much conform in order to be considered a true relational database management system.

Rule

Description

The Foundation Principle

Any RDBMS must be able to manage databases entirely through its relational capabilities. If a database system depends on a record-by-record data manipulation tools, it is not truly relational.

Information

All data in a relational database is represented explicitly as values in tables. Data cannot be stored in any other way.

Guaranteed Access

Every data element must be accessible logically through the use of a combination of its primary key value, table name, and column name.

Missing Information

Null values are supported explicitly. Nulls represent missing or inapplicable information.

System Catalog

The database description or "catalog" exists at the logical level as tabular values. The relational language (SQL) must be able to set the database design in the same manner in which it acts on data stored in the structure.

Comprehensive Language

An RDBMS must support a clearly defined data-manipulation language (SQL) that comprehensively supports data manipulation and definition, view definition, integrity constraints, transactional boundaries, and authorization.

View Updatability

All views that can be updated by the system. In a true RDBMS, most (although not all) views would be updatable.

Set Level Updates

An RDBMS must do more than just be able to retrieve data sets. It has to be capable of inserting, updating, and deleting data as a relational set.

Physical Data Independence

Data must be physically independent of the application program. The underlying RDBMS program or "optimizer" should be able to track physical changes in the data. For example, an RDBMS's application programs should not have to change when an index is added to or deleted from a table.

Logical Data Independence

Whenever possible, application software must be independent of changes made to the base tables. For example, no code should be rewritten when tables are combined into a view.

Integrity Independence

Data integrity must be definable in a relational language and stored in the catalog. Data integrity constraints can be built into applications. However, this approach is foreign to the relational model. In the relational model, the integrity should be inherent in the database design.

Distribution Independence

RDBMS capabilities will not be limited due to the distribution of its components in separate databases.

Nonsubversion

If an RDBMS has a single-record-at-a-time language, that language cannot be used to bypass the integrity rules or constraints of the relational language. Thus, not only must an RDBMS be governed by relational rules, but these rules must be primary laws.


The Relational Design Theory

The relational design theory developed by Dr. Codd consists of the following categories:

  • Tables and uniqueness

  • Foreign keys and domains

  • Relationships

  • Data normalization

  • Integrity rules

The Benefits of Using the Relational Model

Using the relational design theory, you gain the benefits of years of research into the best way to manage data. Some of the benefits you can achieve by following the relational model are

  • Ensuring data integrity.

  • Storing data storage efficiently.

  • Giving your database application tremendous room for growth.

  • Creating a database that behaves predictably because it conforms to these well-tested rules.

  • Enabling other database designers to understand your database because it follows the rules.

  • Ensuring that database schema changes are easy to implement

  • Improving the speed of data access. I know this ties in with efficiency; however, the mention of the word speed is important.

Tables and Uniqueness

When you create database applications, each table represents an entity or process in the real world. Tables can represent people, events, financial transactions, and physical items such as products. Relational theory requires that all data is stored in a table consisting of unique rows and columns. The way to guarantee uniqueness for each rule is to set a primary key for each row.

A primary key is a field or group of fields that uniquely identifies that row. If the primary key consists of multiple fields, it is called a composite key. The primary key must be unique—that is, it must appear in one and only one row. Access allows you to designate a field as a primary key by setting it as a Key value in table design view. Access will then check to see if the data in that field is unique and does not allow duplicates.

Sometimes it can be hard to come up with a unique value for each row while following the business rules in your application. For example, suppose you are creating a contact management system to track your contacts. Your initial database design might resemble the table shown in Figure 3.1.

Figure 3.1 A table storing contact information.

As you develop your contact management system, you might select ContactName, ContactEmail, or one of the ContactPhone fields as the key. Possible key fields are commonly referred to as candidate keys because, during the initial database design period, they might be used as primary keys, but the final decision has not been made.

A common rule of database design is to keep the key as simple as possible and to choose the key from data that is the most unique and least likely to change. A person's name might change due to marriage, divorce, and so on. Phone numbers and e-mail addresses change all the time. A good key would be a Social Security number, but what if your contact lives outside the United States? In this situation, Access allows you to create an AutoNumber field to use as a primary key. An AutoNumber is a numeric value that Access maintains and automatically increments each time you add a new record to a table.

However, an AutoNumber is not the same as a record number. In record number–based database products, record numbers are recalculated as records are added or deleted. In Microsoft Access, a record's AutoNumber remains the same as records are added or deleted (except in an unusual circumstance discussed in Chapter 5, "The Microsoft Jet Database Engine 4.0"). AutoNumber values are not reused in the same table after a record is deleted. There are two types of AutoNumber values:

  • Integers

  • ReplicationID or GUID (used in Replication).

NOTE

You can manipulate a Jet AutoNumber in a few ways. You can set the AutoNumber increment value, reset the starting seed, and start counting backward. See Chapter 5 for more information.

Foreign Keys and Domains

When one table's primary key is used in another table, it is called a foreign key because it is foreign to the second table. Foreign keys are used to relate records from two (or more) tables, a topic discussed in the section titled Relationships later in the chapter. Figure 3.2 illustrates a table using foreign keys.

Figure 3.2 A table with two foreign keys.

Figure 3.2 shows the Contacts table with two foreign keys, ContactID and PhoneID. ContactID is the primary key of the tblContactName table (not shown) and PhoneID is the primary key of the tblContactPhone table (not shown).

When you use foreign keys, they will always be in the same domain. A domain is the pool of possible values from which a column's value is drawn. For example, in the United States, the domain of ZIP codes consists of all valid ZIP codes. Another example might be a university's student ID numbers. If the student ID system is based on Social Security numbers, the domain consists of all valid Social Security numbers available to the system.

Relationships

When you create primary keys and foreign keys, you are defining relationships. Relationships refer to how the records in one table connect to the records in another table. Access supports three different types of relationships:

  • One-to-one relationships

  • One-to-many relationships

  • Many-to-many relationships

To set relationships in Microsoft Access, press the Relationships button on the toolbar or select Tools, Relationships from the menu to bring up the Relationships window.

One-to-One Relationships

Two tables have a one-to-one relationship if, for each row in one table, there is at most one related record in the related table. Figure 3.3 shows an example of a one-to-one relationship.

Figure 3.3 A one-to-one relationship.

Figure 3.3 shows a one-to-one relationship because each record in the table tblContacts relates to, at most, one related record in tblContactEmail. One-to-one relationships are the least common types of relationships because most databases include related information in one table. For security reasons, however, you might choose to break out information into two tables. Financial transactions often involve many one-to-one relationships.

One-to-Many Relationships

The most common type of relationship is the one-to-many relationship. A one-to-many relationship occurs when a table has zero or many related records (sometimes called child records) in another table. A typical example occurs in a customer order database, because one customer can have many orders.

Similarly, in your contact database, one contact (the "one" side of the relationship) might have several phone numbers (the "many" side). To express the relationship in terms of keys, each contact has a unique ID which is the primary key in one table. This primary key would appear as a foreign key in each related record in a table of telephone numbers. Figure 3.4 illustrates a one-to-many relationship.

Figure 3.4 A one-to-many relationship.

The ContactID field is tblContacts' primary key and a foreign key in tblContactPhone. ContactID 2 appears twice in tblContactPhone, meaning that there are two related records in tblContactPhone. Figure 3.4 includes the contents of the table to make the relationships clearer.

Many-to-Many Relationships

Given two tables, tblAlpha and tblBeta, a many-to-many relationship occurs when one or more rows in tblAlpha relate to zero or more rows in tblBeta, and one or more rows in tblBeta relates to zero or more rows in tblAlpha.

Many-to-many relationships are problematic because it is difficult to select unique records from one table using existing relationships. Suppose you have three contacts (stored in tblAlpha) who use any one of three fax numbers (stored in tblBeta). If you want to send one of those contacts a fax, it is not immediately clear which fax number to use. Conversely, if you receive a fax from one of those three numbers that lists the sender's fax number but not the sender's name, you will not immediately know which of the three sent the fax.

Unfortunately, Access does not have a way to resolve many-to-many relationships between two tables. The only way to do so is to create a so-called "linking" table. A linking table is a table that contains the primary keys of both tables as foreign keys.

Figure 3.5 shows a many-to-many relationship between tblContacts and tblContactType through the linking table tblContactInfo. tblContacts contains all the contacts in your database. tblContactType lists all the types of contact information: Fax, Email, Phone, and Pager. tblContactInfo is the linking table. It contains a reference to a contact (ContactID), a contact type (ContactTypeID), and the contact information itself (ContactInfo), so you can locate either a specific telephone number using a contact's name or a specific contact name using a particular telephone number.

Moreover, if your contact adds a new type of communication (like a Web address) to the list, you simply add "Web Address" to the lookup table (tblContactType), and then add the new linking record to tblContactInfo. This is the most flexible way to build a contact management system because it avoids repeating fields (Phone1, Phone2, and so forth), blank fields, and having to add new fields to a table when you add another communication type to the list. A common naming convention when using a linking table this way is to use the "trel" prefix to identify a linking table. In this case, the tblContactInfo would be named trelContactInfo.

Figure 3.5 A many-to-many relationship using a linking table.

Data Normalization

Data normalization describes the process of designing a database (or modifying an existing database design) and organizing data to take best advantage of relational database principles. Codd defined six levels of normalization that he described as normal forms. The first three, known as First, Second, and Third Normal Form, have the most impact on database design decisions. As you read through the next few paragraphs, keep the following somewhat whimsical but useful summation of Codd's first three laws in mind: "The values in a row are dependent on the key, the whole key, and nothing but the key, so help me Codd."

First Normal Form

First Normal Form requires that all columns (fields) in a table contain atomic values. That is, each field should contain one value, not a list of values or any repeating groups of data. Many flat file "databases" store data in this fashion, which makes searching quite difficult.

NOTE

The term flat file database refers to files, almost always text files but sometimes binary, containing some type of data. They are called "flat" because they lack any structure or hierarchical arrangement suggesting the relationship between one record and the next. Flat files are often database dumps.

Figure 3.6 shows an example of a table that is not in First Normal Form.

Figure 3.6 A table not in First Normal Form.

The ContactInfo column violates First Normal Form because it contains multiple pieces of information. Putting the contact type (Fax, Voice, Email) in its own column (ContactType), making separate records for each contact number in the ContactInfo column, creating the ContactInfoID column, and making ContactID the table's primary key results in Figure 3.7.

Figure 3.7 A Table in First Normal Form.

The table shown in Figure 3.7 is in First Normal Form because each column contains a single, discrete piece of information.

Second Normal Form

A table in Second Normal Form is one in First Normal Form and in which every non-key field is fully dependent on the entire primary key. That is, any column in a table that is not the primary key or part of a composite primary key must depend on the primary key or all parts of a composite primary key for its meaning or interpretation.

tblContact is not in Second Normal Form. The primary key is a composite consisting of ContactID, ContactInfoID, and ContactInfo. ContactName does depend on ContactID, but not on ContactInfoID or ContactInfo.

To put this table in Second Normal Form, put ContactInfo, ContactInfoID, and ContactType in a separate table. Next, use ContactID as a foreign key in the new table, creating a one-to-many relationship. The result is shown in Figure 3.8. Now both tables are in Second Normal Form because the columns in each table depend on their respective table's primary keys.

Figure 3.8 A Table in Second Normal Form.

Third Normal Form

A table is in Third Normal Form when it is in Second Normal Form and when no non-key column depends on the primary key.

tblContactOld in Figure 3.8 contains a field dependent on another non-key field, so it is not in Third Normal Form. If you do not see it right away, look at it for a few moments before continuing with the next paragraph.

If you missed it, the dependent column is ContactType. Why? Given the ContactID and ContactInfoID, you can use a query to obtain the ContactType. The solution is to create a third table named tblContactType using the ContactType descriptions, and use tblContactType's primary key (ContactTypeID, in this case) as a foreign key in tblContactOld. The result is shown in Figure 3.9.

Figure 3.9 Tables in Third Normal Form.

All three tables are now in Third Normal Form.

Real-World Benefits of Normalization

Why bother to normalize data? The original table in the previous example (Figure 3.6), listed the contact types and information in an array of values, making searching for the contents of a fax number very difficult. In addition, when one of your contacts' phone number changes or gets a new form of communication method, like a cellular phone, updating the data is difficult and easily botched. First Normal Form improved the situation and gave you more flexibility because it created separate records for each contact's voice number, fax number, e-mail address, and so forth.

First Normal Form, shown in Figure 3.7, only partly solves the problem. Contact data is difficult to manage because the same information is repeated many times and because storing the same name 10 times wastes disk space.

Second Normal Form, in Figure 3.8, is almost, but not quite there. It eliminates most of the wasted disk space and repetitious information, but limits your flexibility. Third Normal Form allows you to add and remove contact types from a single table without requiring any changes to the structure of other tables. The tables in Figure 3.9 give you the most flexibility and use the least amount of disk space.

Data Integrity Rules

When normalizing a set of tables, you should also consider data integrity rules. Data integrity rules, sometimes called constraints, are rules, often enforced by the database engine itself, that ensure the consistency of your data. For example, to revisit the contact database example a final time, if you delete a contact from the main table (tblContact), you want to make sure that related records in other tables are also deleted. Deleting only a related record, on the other hand, should not delete other related records or, certainly, the primary contact information. Most data integrity rules in an Access 2002 database are determined by the relationships you define. In addition, Access lets you specify cascading updates and deletes when defining relationships.

A cascading update ensures that when you update a record's primary key value on the "one" side of a one-to-many relationship, the change will be reflected in all records on the "many" side. The change, that is, will cascade through all related records.

Suppose you have a lookup table for states. New York is in this table with a primary key of NY and a description of New York. Imagine that New York City broke away and formed its own state (before you laugh, it almost happened in 1789!) When you change your NY value to NNY (for "New New York"), the values of NY in all the child tables will be changed to NNY. If cascading updates were not set, you would need to add a record to the lookup table for NY1, update all the records in the child table, and then delete the NY record from the lookup table.

A cascading delete has the same effect for deleting a primary key. Cascading deletes ensure that deleting a record from a "one" table also deletes all related records from the "many" table(s). This feature has a downside, however. If you delete a customer and cascading deletes were set, all the "many" invoices would be deleted. If you do not have deletes set, the deletion would not be allowed in the "one" table until you deleted all the records in the "many" table, to avoid leaving any orphans. Orphaned records are records not related to an existing primary key. As a result, they are not directly accessible and constitute wasted disk space.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020