SQL and Data
In this chapter, you will learn about:
Data, Databases, and the Definition of SQL
The STUDENT Schema Diagram
What is SQL? SQL (pronounced sequel) is an acronym for Structured Query Language, a standardized language used to access and manipulate data. The history of SQL corresponds closely with the development of relational databases concepts published in a paper by Dr. E. F. Codd at IBM in 1970. He applied mathematical concepts to the specification of a method for data storage and access; this specification, which became the basis for relational databases, was intended to overcome the physical dependencies of the then-available database systems. The SQL language (originally called "System R" in the prototype and later called "SEQUEL") was developed by the IBM Research Laboratory as a standard language to use with relational databases. In 1979 Oracle, then called Relational Software, Inc., introduced the first commercially available implementation of a relational database incorporating the SQL language. The SQL language evolved with many additional syntax expansions incorporated into the American National Standards Institute (ANSI) SQL standards developed since. Individual database vendors continuously added extensions to the language, which eventually found their way into the latest ANSI standards used by relational databases today. Large-scale commercial implementations of relational database applications started to appear in the mid to late 1980s as early implementations were hampered by poor performance. Since then, relational databases and the SQL language have continuously evolved and improved.
Before you begin to use SQL, however, you must know about data, databases, and relational databases. What is a database? A database is an organized collection of data. A database management system (DBMS) is software that allows the creation, retrieval, and manipulation of data. You use such systems to maintain patient data in a hospital, bank accounts in a bank, or inventory in a warehouse. A relational database management system (RDBMS) provides this functionality within the context of the relational database theory and the rules defined for relational databases by Codd. These rules, called "Codd's Twelve Rules," later expanded to include additional rules, describe goals for database management systems to cope with ever-challenging and demanding database requirements. Compliance with Codd's Rules has been a major challenge for database vendors and early versions of relational databases and many desktop databases complied with only a handful of the rules.
Today, SQL is accepted as the universal standard database access language. Databases using the SQL language are entrusted with managing critical information affecting many aspects of our daily lives. Most applications developed today use a relational database and Oracle continues to be one of the largest and most popular database vendors. Although relational databases and the SQL language are already over 30 years old, there seems to be no slowing down of the popularity of the language. Learning SQL is probably one of the best long-term investments you can make for a number of reasons:
SQL is used by most commercial database applications.
Although the language has evolved over the years with a large array of syntax enhancements and additions, most of the basic functionality has remained essentially unchanged.
SQL knowledge will continue to be a fundamental skill as there is currently no mature and viable alternative language that accomplishes the same functionality.
Learning Oracle's specific SQL implementation provides you with great insight into the feature-rich functionality of one of the largest and most successful database vendors.
Understanding relational database concepts provides you with the foundation for understanding the SQL language. Those unfamiliar with relational concepts or interested in a refresher will receive an overview of basic relational theories in the next two labs. If you are already familiar with relational theory, you can skip the first two labs and jump directly to Lab 1.3, "The STUDENT Schema Diagram." The STUDENT database manages student enrollment data at a fictional university. Lab 1.3 teaches you about the organization and relationships of the STUDENT database, which is used throughout the exercises in this book.
Lab 1.1 Data, Databases, and the definition of sql
After this lab, you will be able to:
Identify and Group Data
Define the Structures of a RDBMS: Tables, Columns, Rows, and Keys
Data is all around youyou make use of it every day. Your hair may be brown, your flight leaves from gate K10, you try to get up in the morning at 6:30 a.m. Storing data in related groups and making the connections among them are what databases are all about.
You interact with a database when you withdraw cash from an ATM machine, order a book from a Web site, or check stock quotes on the Internet. The switch from the information processing society to the knowledge management society will be facilitated by databases. Databases provide a major asset to any organization by helping it run its business and databases represent the backbones of the many technological advances we enjoy today.
Before the availability of relational databases, data was stored in individual files that could not be accessed unless you knew a programming language. Data could not be combined easily and modifications to the underlying database structures were extremely difficult. The Relational Model conceived by E. F. Codd provided the framework to solve a myriad of these and many other database problems. Relational databases offer data independence, meaning a user does not need to know on which hard drive and file a particular piece of information is stored. The RDBMS provides users with data consistency and data integrity. For example, if an employee works in the Finance department and we know that he can only work for one department, there should not be duplicate department records or contradicting data in the database. As you work through this lab, you will discover many of these useful and essential features. Let's start with a discussion of the terminology used in relational databases.
A relational database stores data in tables. Tables typically contain data about a single subject. Each table has a unique name that signifies the contents of the data. For example, you can store data about books you read in a table called book.
Columns in a table organize the data further and a table consists of at least one column. Each column represents a single, low-level detail about a particular set of data. The name of the column is unique within a table and identifies the data you find in the column. For example, the book table may have a column for the title, publisher, date the book was published, and so on. The order of the columns is unimportant because SQL allows you to display data in any order you choose.
Each row usually represents one unique set of data within this table. For example, the row in Figure 1.1 with the title "The Invisible Force" is unique within the BOOK table. All the columns of the row represent respective data for the row. Each intersection of a column and row in a table represents a value and some do not, as you see in the PUBLISH_DATE column. The value is said to be NULL. Null is an unknown value, so it's not even blank spaces. Nulls cannot be evaluated or compared because they are unknown.
FIGURE 1.1 Example of the BOOK table.
When working with tables, you must understand how to uniquely identify data within a table. This is the purpose of the primary key; it uniquely identifies a row within a table, which means that you find one, and only one row in the table by looking for the primary key value. Figure 1.2 shows an example of the Customer table with the Customer_ID as the primary key of the table.
Figure 1.2 Primary key example.
At first glance you may think that the Customer_Name column can serve as the primary key of the CUSTOMER table because it is unique. However, it is entirely possible to have customers with the same name. Therefore, the CUSTOMER_NAME column is not a good choice for the primary key. Sometimes the unique key is a system-generated sequence number; this type of key is called a synthetic or surrogate key. The advantage of such a surrogate key is that it is unique and does not have any inherent meaning or purpose; therefore, it is not subject to changes. In this example, the CUSTOMER_ID column is such a surrogate key.
It is best to avoid any primary keys that are subject to updates as they cause unnecessary complexity. For example, the phone number of a customer is a poor example of a primary key column choice. Though it may possibly be unique within a table, phone numbers can change and then cause a number of problems with updates of other columns that reference this column.
A table may have only one primary key, which consists of one or more columns. If the primary key contains multiple columns it is referred to as a composite primary key or concatenated primary key. (Choosing appropriate keys is discussed more in Chapter 11, "Create, Alter, and Drop Tables.") Oracle does not require that every table have a primary key and there may be cases where it is not appropriate to have one. However, it is strongly recommended that most tables have a primary key.
If you store the customer and the customer's order information in one table, the customer's name and address is repeated for each order. Figure 1.3 depicts such a table. Any change to the address requires the update of all the rows in the table for that individual customer.
Figure 1.3 Example of CUSTOMER data mixed with ORDER data.
If, however, the data is split into two tables (CUSTOMER and ORDER as shown in Figure 1.4) and the customer's address needs to be updated, only one row in the customer table needs to be updated. Furthermore, splitting data this way avoids data inconsistency whereby the data differs between the different rows. Eliminating redundancy is one of the key concepts in relational databases and this process, referred to as normalization, is discussed shortly.
Figure 1.4 Primary and foreign key relationship between CUSTOMER and Order tables.
Figure 1.4 illustrates how the data is split into two tables to provide data consistency. In this example, the Customer_ID becomes a foreign key column in the Order table. The foreign key is the column that links the Customer and Order table together. In this example, you can find all orders for a particular customer by looking for the particular CUSTOMER_ID in the ORDER table. The CUSTOMER_ID would correspond to a single row in the CUSTOMER table that provides the customer-specific information. The foreign key column Customer_ID happens to have the same column name in the Order table. This makes it easier to recognize the fact that the tables share common column values. Often the foreign key column and the primary key have identical column names, but it is not required. You will learn more about foreign key columns with the same and different names and how to create foreign key relationships in Chapter 11, "Create, Alter, and Drop Tables." Chapter 6, "Equijoins," teaches you how to combine results from the two tables using SQL.
You connect and combine data between tables in a relational database via data common columns.
SQL language commands
You work with the tables, rows, and columns using the SQL language. SQL allows you to query data, create new data, modify existing data, and delete data. Within the SQL language you can differentiate between individual sublanguages, which are a collection of individual commands.
For example, the Data Manipulation Language (DML) commands allow you to query, insert, update, or delete data. SQL allows you to create new database structures such as tables or modify existing ones; this subcategory of SQL language commands is called the Data Definition Language (DDL). Using the SQL language you can control access to the data using Data Control Language (DCL) commands. Table 1.1 shows you an overview of different language categories with their respective SQL commands.
Table 1.1 Overview of SQL Language Commands
SELECT, INSERT, UPDATE, DELETE, MERGE
CREATE, ALTER, DROP, TRUNCATE, RENAME
COMMIT, ROLLBACK, SAVEPOINT
One of the first statements you will execute is the SELECT command, which allows you to retrieve data. For example, to retrieve the TITLE and PUBLISHER columns from the BOOK table you may issue a SELECT statement such as the following:
SELECT title, publisher FROM book
The INSERT command lets you add new rows to a table. The next command shows you an example of an INSERT statement that adds a row to the BOOK table. The row contains the values Oracle SQL as a book title, a BOOK_ID of 1013, and a publish date of 12/02 with Prentice Hall as the publisher.
INSERT INTO book (book_id, title, publisher, publish_date) VALUES (1013, 'Oracle SQL', 'Prentice Hall', '12/02')
To create new tables you use the CREATE TABLE command. The following statement illustrates how to create a simple table called AUTHOR with three columns. The first column, called AUTHOR_ID, holds numeric data; the FIRST_NAME and LAST_NAME columns contain alphanumeric character data.
CREATE TABLE author (author_id NUMBER, first_name VARCHAR2(30), last_name VARCHAR2(30))
You can manipulate the column definitions of a table with the ALTER TABLE command. This allows you to add or drop columns. You can also create primary and foreign key constraints on a table. Constraints allow you to enforce business rules within the database. For example, a primary key constraint can enforce the uniqueness of the AUTHOR_ID column in the AUTHOR table.
To grant SELECT and INSERT access to the AUTHOR table, you issue a GRANT command. It allows the user Scott to retrieve and insert data in the AUTHOR table.
GRANT SELECT, INSERT ON author TO scott
Starting with Chapter 2, "SQL: The Basics," you will learn how to execute the SELECT command against the Oracle database; Chapter 10, "Insert, Update, and Delete," will teach you the details of data manipulation; and Chapter 11, "Create, Alter, and Drop Tables," introduces you to the creation of tables and the definition of constraints to enforce the required business rules. Chapter 14, "Security," discusses how to control the access to data and the various Oracle database features.