InformIT

MySQL SQL Syntax and Use

Date: Mar 14, 2005

Sample Chapter is provided courtesy of Sams.

Return to the article

Structured Query Language (SQL) is the language that the MySQL server understands, so fluency with SQL is necessary for effective communication with the server. When you use a program such as the mysql client, it functions primarily as a way for you to send SQL statements to the server to be executed. You must also know SQL if you write programs that use the MySQL interface provided by your programming language, because the interface functions as the means that allows you to communicate with the server by issuing SQL statements.

Chapter 1, "Getting Started with MySQL and SQL," presents a tutorial introduction to many of MySQL's capabilities. Now we'll build on that material to go into more detail on several areas of SQL implemented by MySQL:

Several aspects of how the MySQL server executes SQL statements can be modified by setting its SQL mode. Instructions for doing this are given in "The Server SQL Mode."

MySQL's SQL statements may be grouped into several broad categories; Table 2.1 lists representative statements for each. Some of the statements in the table are not covered here because they are more appropriately discussed in other chapters. For example, the administrative statements GRANT and REVOKE for setting up user privileges are dealt with in Chapter 11, "General MySQL Administration." Chapter 12, "MySQL and Security," provides further details on what privileges are available and what they allow. The syntax for all SQL statements implemented by MySQL is listed in Appendix E, "SQL Syntax Reference." That appendix also covers the syntax for using comments in your SQL statements. In addition, you should consult the MySQL Reference Manual for additional information, especially for changes made in the most recent versions of MySQL.

Table 2.1 Types of SQL Statements Supported by MySQL

Selecting, Creating, Dropping, and Altering Databases

USE

CREATE DATABASE

DROP DATABASE

ALTER DATABASE

Creating, Altering, and Dropping Tables and Indexes

CREATE TABLE

DROP TABLE

CREATE INDEX

DROP INDEX

ALTER TABLE

Getting Information About Databases and Tables

DESCRIBE

SHOW

Retrieving Information from Tables

SELECT

UNION

Performing Transactions

SET autocommit

START TRANSACTION

COMMIT

ROLLBACK

Modifying Information in Tables

DELETE

INSERT

LOAD DATA

REPLACE

UPDATE

Administrative Statements

FLUSH

GRANT

REVOKE

MySQL Naming Rules

Almost every SQL statement refers in some way to a database or its constituent elements. This section describes the syntax and case sensitivity rules for identifiers that refer to databases, tables, columns, indexes, and aliases.

Referring to Elements of Databases

When you use identifiers to refer to elements of databases, you are constrained by the characters you can use and the length that identifiers can be. The format of identifiers also depends on the context in which you use them. Another factor that affects naming rules is that the server can be configured to use different SQL modes.

Case Sensitivity in SQL Statements

Case sensitivity rules in SQL statements vary for different parts of the statement, and also depend on what you are referring to and the operating system of the machine on which the server is running:

Regardless of whether a database or table name is case sensitive on your system, you must refer to it using the same lettercase throughout a given query. That is not true for SQL keywords, function names, or column and index names, all of which may be referred to in varying lettercase style throughout a query. However, the query will be more readable if you use a consistent lettercase rather than "ransom note" style (SelECt NamE FrOm ...).

The Server SQL Mode

In MySQL 4.0, a server SQL mode system variable named sql_mode was introduced to allow configuring certain aspects of how the server executes SQL statements. Initially, this variable could be set only by means of the --sql-mode startup option. As MySQL 4.1.1, the SQL mode also can be changed at runtime and individual clients can change the mode to affect their own connection. This means that any client can change how the server behaves in relation to itself without impact on other clients.

The SQL mode affects behaviors such as identifier quoting and handling of invalid values during data entry. The following list describes a few of the possible mode values:

When you set the SQL mode, specify a value consisting of one or more mode values separated by commas, or an empty string to clear the value. Mode values are not case sensitive.

To set the SQL mode when you start the server, use the --sql-mode option on the command line or in an option file:

--sql-mode="ANSI"
--sql-mode="ANSI_QUOTES,PIPES_AS_CONCAT"

To change the SQL mode at runtime, set the sql_mode system variable with a SET statement:

To determine the current value of the session or global SQL mode, use these statements:

SELECT @@SESSION.sql_mode;
SELECT @@GLOBAL.sql_mode;

The value returned consists of a comma-separated list of enabled modes, or an empty value if no modes are enabled. The full set of mode values is given in the description of the sql_mode variable in Appendix D, "System, Status, and User Variable Reference." For discussion of SQL mode values that affect handling of erroneous or missing values during data entry, see "How MySQL Handles Invalid Data Values," in Chapter 3. General background on system variables is provided in "Setting and Checking System Variable Values," in Chapter 11.

Character Set Support

Prior to MySQL 4.1, the server operates using a single character set at a time. As of MySQL 4.1, the server can support multiple character sets simultaneously, and character sets can be specified independently at the server, database, table, column, or string constant level. For example, if you want a table's columns to use latin1 by default, but also to include a Hebrew column and a Greek column, you can do that. In addition, you can explicitly specify collations (sorting orders). It is also possible to find out what character sets and collations are available, and to convert data from one character set to another.

This section provides general background on using MySQL's character set support. Chapter 3 provides more specific discussion of character sets, collations, binary versus non-binary strings, and how to define character-based table columns and work with them. Chapter 11 discusses how to configure which character sets the server makes available. That chapter also includes notes on what to do when upgrading older tables to MySQL 4.1 so that you can use the new features.

Character Set Support Before MySQL 4.1

Prior to MySQL 4.1, string values in MySQL have no explicit character set. Instead, string constants and column values always are interpreted with respect to the server's character set. By default, this is the character set selected when the server was built (usually latin1), but the built-in value may be overridden at runtime with the --default-character-set option. This is very simple but quite limiting. For example, you cannot have a table that stores values using different character sets for different columns.

The single-character-set model also can lead to index-related problems if you change the server's character set after having already created tables and loaded character data into them. These problems occur due to the fact that index values for character columns are stored in sorted order according to the collating sequence of the character set that happens to be in force at the time the index entries are created. Character sets have different collating sequences, so if you load a table while the server is using one character set and then reconfigure the server to use a different set, it's possible that the index entries no longer will be in the correct order with respect to the collating sequence of the new character set. Worse, if you add new rows to the table, the index that was initially created using the sort order of the original character set will be updated using the order of the new set. Consequently, index-based queries might not work correctly.

The solution to this problem is to rebuild the indexes for each existing table that has character-based indexes to use the collating order of the new character set. A table can be converted in various ways:

Despite the methods available for reordering indexes if you change the server's character set, the fact that you need to do it at all is a bother. MySQL 4.1 eliminates the need. Each character column is associated with a character set and collation that determines its ordering, so if you change the server character set, it has no impact on individual columns. Also, if you change a column's character set or collation, MySQL automatically reorders the indexes without manual intervention.

When you upgrade an older server to MySQL 4.1 or newer, your old tables can still be used but you should convert them to 4.1 format so that you can take full advantage of the improved character set support instituted in 4.1. Instructions for doing this can be found in "Upgrading Tables When Character Set Conversion Is Necessary," in Chapter 11.

Character Set Support in MySQL 4.1 and Up

In MySQL 4.1, character set support was revised considerably to provide the following features:

You cannot mix character sets within a string, or use different character sets for different rows of a given column. However, by using a Unicode character set (which represents the encodings for many languages within a single character set), you may be able to implement multi-lingual support of the type you desire.

Specifying Character Sets

Character set and collation assignments can be made at several levels, from the default used by the server to the character set used for individual strings.

The server's default character set and collation are built in at compile time. You can override them at server startup time by using the --character-set-server and --collation-server options. If you specify only the character set, its default collation becomes the server's default collation. If you specify a collation, it must be compatible with the character set. (A collation is compatible with a character set if its name begins with the character set name. For example, utf8_danish_ci is compatible with utf8 but not with latin1.)

In SQL statements that create databases and tables, two clauses are used for specifying database, table, and column character set and collation values:

CHARACTER SET charset
COLLATE collation

charset is the name of a character set supported by the server, and collation is the name of one of the character set's collations. These clauses can be specified together or separately. If both are given, the collation name must be compatible with the character set. If only CHARACTER SET is given, its default collation is used. If only COLLATE is given, the character set is implicit in the first part of the character set name.

It's also possible to sort string values using a specific collation by using the COLLATE operator. For example, if c is a latin1 column that has the default collation of latin1_swedish_ci, but you want to order it using Spanish sorting rules, do this:

SELECT c FROM t ORDER BY c COLLATE latin1_spanish_ci;

Determining Character Set Availability and Current Settings

To find out which character sets and collations are available, use these statements:

SHOW CHARACTER SET;
SHOW COLLATION;

Each of them supports a LIKE clause that narrows the results to those character set or collation names matching a pattern. For example, this statement lists the Latin-based character sets:

mysql> SHOW CHARACTER SET LIKE 'latin%';
+---------+-----------------------------+-------------------+--------+
| Charset | Description                 | Default collation | Maxlen |
+---------+-----------------------------+-------------------+--------+
| latin1  | ISO 8859-1 West European    | latin1_swedish_ci |      1 |
| latin2  | ISO 8859-2 Central European | latin2_general_ci |      1 |
| latin5  | ISO 8859-9 Turkish          | latin5_turkish_ci |      1 |
| latin7  | ISO 8859-13 Baltic          | latin7_general_ci |      1 |
+---------+-----------------------------+-------------------+--------+

This statement lists the collations available for the utf8 character set:

mysql> SHOW COLLATION LIKE 'utf8%';
+--------------------+---------+-----+---------+----------+---------+
| Collation          | Charset | Id  | Default | Compiled | Sortlen |
+--------------------+---------+-----+---------+----------+---------+
| utf8_general_ci    | utf8    | 33  | Yes     | Yes      |       1 |
| utf8_bin           | utf8    | 83  |         | Yes      |       1 |
| utf8_unicode_ci    | utf8    | 192 |         | Yes      |       8 |
| utf8_icelandic_ci  | utf8    | 193 |         | Yes      |       8 |
| utf8_latvian_ci    | utf8    | 194 |         | Yes      |       8 |
| utf8_romanian_ci   | utf8    | 195 |         | Yes      |       8 |
| utf8_slovenian_ci  | utf8    | 196 |         | Yes      |       8 |
| utf8_polish_ci     | utf8    | 197 |         | Yes      |       8 |
| utf8_estonian_ci   | utf8    | 198 |         | Yes      |       8 |
| utf8_spanish_ci    | utf8    | 199 |         | Yes      |       8 |
| utf8_swedish_ci    | utf8    | 200 |         | Yes      |       8 |
| utf8_turkish_ci    | utf8    | 201 |         | Yes      |       8 |
| utf8_czech_ci      | utf8    | 202 |         | Yes      |       8 |
| utf8_danish_ci     | utf8    | 203 |         | Yes      |       8 |
| utf8_lithuanian_ci | utf8    | 204 |         | Yes      |       8 |
| utf8_slovak_ci     | utf8    | 205 |         | Yes      |       8 |
| utf8_spanish2_ci   | utf8    | 206 |         | Yes      |       8 |
+--------------------+---------+-----+---------+----------+---------+

As can be seen in the output from these statements, each character set has at least one collation and one of them is its default collation.

To display the server's current character set and collation settings, use SHOW VARIABLES:

mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| character_set_client     | latin1 |
| character_set_connection | latin1 |
| character_set_database   | latin1 |
| character_set_results    | latin1 |
| character_set_server     | latin1 |
| character_set_system     | utf8   |
+--------------------------+--------+
mysql> SHOW VARIABLES LIKE 'collation\_%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+

Unicode Support

One of the reasons there are so many character sets is that different encodings have been developed for different languages. This presents several problems. For example, a given character that is common to several languages might be represented by different numeric values in different encodings. Also, different languages require different numbers of bytes to represent characters. The latin1 character set is small enough that every character fits in a single byte, but languages such as those used in Japan and China contain so many characters that they require multiple bytes per character.

The goal of Unicode is to provide a unified character-encoding system within which character sets for all languages can be represented in a consistent manner. In MySQL, Unicode support is provided through two character sets:

Selecting, Creating, Dropping, and Altering Databases

MySQL provides several database-level statements: USE for selecting a default database, CREATE DATABASE for creating databases, DROP DATABASE for removing them, and ALTER DATABASE for modifying global database characteristics.

Note: From MySQL 5.0.2 on, you can use the keyword SCHEMA as a synonym for DATABASE in any statement where the latter occurs.

Selecting Databases

The USE statement selects a database to make it the default (current) database for a given connection to the server:

USE db_name;

You must have some access privilege for the database or you cannot select it.

It is not strictly necessary to select a database explicitly. If you have access to a database, you can use its tables without selecting it if you use qualified names that identify both the database and the table. For example, to retrieve the contents of the president table in the sampdb database without selecting the database first, write the query like this:

SELECT * FROM sampdb.president;

However, it's much more convenient to refer to tables without having to specify a database qualifier.

Selecting a database doesn't mean that it must be the default for the duration of the connection. You can issue any number of USE statements to switch back and forth among databases. Nor does selecting a database limit you to using tables only from that database. While one database is the default, you can refer to tables in other databases by qualifying their names with the appropriate database identifier.

When your connection to the server terminates, any notion by the server of which database was the default for the connection disappears. That is, if you connect to the server again, it doesn't remember what database you had selected previously.

Creating Databases

To create a database, use a CREATE DATABASE statement:

CREATE DATABASE db_name;

The conditions on database creation are that the name must be a legal identifier, the database must not already exist, and you must have sufficient privileges to create it.

When you create a database, the MySQL server creates a directory under its data directory that has the same name as the database. The new directory is called the database directory. The server also creates a db.opt file in the database directory to store database attributes.

CREATE DATABASE supports several optional clauses. The full syntax is as follows:

CREATE DATABASE [IF NOT EXISTS] db_name
 [CHARACTER SET charset] [COLLATE collation];

The options have the following meanings:

To see the definition for an existing database, use a SHOW CREATE DATABASE statement:

mysql> SHOW CREATE DATABASE mydb\G
*************************** 1. row ***************************
       Database: mydb
Create Database: CREATE DATABASE ´mydb´
                 /*!40100 DEFAULT CHARACTER SET utf8
                 COLLATE utf8_icelandic_ci */

Dropping Databases

Dropping a database is just as easy as creating one, assuming that you have sufficient privileges:

DROP DATABASE db_name;

The DROP DATABASE statement is not something to use with wild abandon. It removes the database and all tables within it, so the database is gone forever unless you have been making backups regularly. In other words, don't try this statement just to see how it works unless you don't care about the database contents.

A database is represented by a directory under the data directory, and the directory is intended for storage of table data. If a DROP DATABASE statement fails, the reason most likely is that the database directory contains non-table files. DROP DATABASE will not delete such files, and as a result will not delete the directory, either. This means that the database directory continues to exist (albeit empty of any tables) and will show up if you issue a SHOW DATABASES statement. To really drop the database if this occurs, manually remove any remaining files in the database directory and the directory itself.

Altering Databases

The ALTER DATABASE statement makes changes to a database's global attributes. Currently, the only such attributes are the default character set and collation:

ALTER DATABASE db_name [CHARACTER SET charset] [COLLATE collation];

The earlier discussion for CREATE DATABASE describes the effect of the optional CHARACTER SET and COLLATE clauses.

Creating, Dropping, Indexing, and Altering Tables

MySQL allows you to create tables, drop (remove) them, and change their structure with the CREATE TABLE, DROP TABLE, and ALTER TABLE statements. The CREATE INDEX and DROP INDEX statements allow you to add or remove indexes on existing tables. The following sections provide the details for these statements. But first it's necessary to discuss the storage engines that MySQL supports for managing different types of tables.

Storage Engine Characteristics

MySQL supports multiple storage engines, or "table handlers" as they used to be known. Each storage engine implements tables that have a specific set of properties or characteristics. The following table lists the current engines and the versions in which they first became available.

Storage Engine

First Appearance in MySQL

ISAM

All versions

MyISAM

3.23.0

MEMORY

3.23.0 (known as HEAP before MySQL 4.0.13)

BDB

3.23.17

MERGE

3.23.25

InnoDB

3.23.29

NDB

4.1.2

EXAMPLE

4.1.2

ARCHIVE

4.1.3

CSV

4.1.4

FEDERATED

5.0.3


Some of the engine names have synonyms. MRG_MyISAM, BerkeleyDB, and NDBCLUSTER are synonyms for MERGE, BDB, and NDB, respectively. The MEMORY and InnoDB storage engines originally were known as HEAP and Innobase. The latter names still are recognized, but are deprecated.

Checking Which Storage Engines Are Available

It's quite possible that a given MySQL server will not support all available storage engines. The engines actually available to you depend on your version of MySQL, how the server was configured at build time, and the options with which it was started. For details on selecting storage engines, see "Storage Engine Configuration," in Chapter 11.

To see a list of available storage engines, use the SHOW ENGINES statement:

mysql> SHOW ENGINES;
+------------+---------+------------------------------------------------ ...
| Engine     | Support | Comment                     ...
+------------+---------+------------------------------------------------ ...
| MyISAM     | DEFAULT | Default engine as of MySQL 3.23 with great perf ...
| HEAP       | YES     | Alias for MEMORY                ...
| MEMORY     | YES     | Hash based, stored in memory, useful for tempor ...
| MERGE      | YES     | Collection of identical MyISAM tables      ...
| MRG_MYISAM | YES     | Alias for MERGE                 ...
| ISAM       | NO      | Obsolete storage engine, now replaced by MyISAM ...
| MRG_ISAM   | NO      | Obsolete storage engine, now replaced by MERGE ...
| InnoDB     | YES     | Supports transactions, row-level locking, and f ...
| INNOBASE   | YES     | Alias for INNODB                ...
| BDB        | YES     | Supports transactions and page-level locking  ...
| BERKELEYDB | YES     | Alias for BDB                  ...
| NDBCLUSTER | NO      | Clustered, fault-tolerant, memory-based tables ...
| NDB        | NO      | Alias for NDBCLUSTER              ...
| EXAMPLE    | NO      | Example storage engine             ...
| ARCHIVE    | NO      | Archive storage engine             ...
| CSV        | YES     | CSV storage engine               ...
| FEDERATED  | YES     | Federated MySQL storage engine         ...
+------------+---------+------------------------------------------------ ...

The value in the Support column is YES or NO to indicate that the engine is or is not available, DISABLED if the engine is present but turned off, or DEFAULT for the storage engine that the server uses by default. The engine designated as DEFAULT should be considered available.

Availability of storage engines also can be determined to some extent by referring to the server's version number or system variables. ISAM is the only storage engine available before MySQL 3.23. From 3.23 on, MyISAM, MERGE, and MEMORY are always available, and availability of the other types can be assessed by means of an appropriate SHOW VARIABLES statement:

SHOW VARIABLES LIKE 'have_isam';
SHOW VARIABLES LIKE 'have_bdb';
SHOW VARIABLES LIKE 'have_innodb';

If the output from SHOW VARIABLES shows that the variable has a value of YES, the corresponding storage engine is enabled. If the value is something else or there is no output, the engine is unavailable.

Table Representation on Disk

Each time you create a table, MySQL creates a disk file that contains the table's format (that is, its definition). This file is stored in the database directory for the database that the table belongs to. This is true no matter which storage engine manages the table. The format file has a basename that is the same as the table name and an .frm extension. That is, for a table named t, the format file is named t.frm. The association of each table with an frm file has a table-naming consequence: Table names must be legal for use in filenames.

Most storage engines also create other files that are unique to the table, to be used for storing the table's content. For any given table, the files specific to it are located in the directory that represents the database that contains the table. Table 2.2 shows the filename extensions for the table-specific files that each storage engine creates.

Table 2.2 Table-Specific Files Created by Storage Engines

Storage Engine

Files on Disk

ISAM

.frm (format), .ISD (data), .ISM (indexes)

MyISAM

.frm (format), .MYD (data), .MYI (indexes)

MERGE

.frm (format), .MRG (list of constituent MyISAM table names)

MEMORY

.frm (format)

BDB

.frm (format), .db (data and indexes)

InnoDB

.frm (format), .ibd (data and indexes)

EXAMPLE

.frm (format)

ARCHIVE

.frm (format), .ARZ (data), .ARM (metadata)

CSV

.frm (format), .CSV (data)

FEDERATED

.frm (format)


In some cases, the format file is the only file specifically associated with a particular table:

The following sections describe specific characteristics of MySQL's storage engines.

The ISAM Storage Engine

The ISAM storage engine manages tables that use the indexed sequential access method. ISAM was the original storage engine in MySQL, and was the only one available prior to MySQL 3.23. ISAM has since been superseded by the MyISAM storage engine. MyISAM tables are the preferred general replacement because they have fewer limitations. The ISAM engine currently is still available but considered obsolete and support for it will fade over time. For example, ISAM support has been omitted from the embedded server, and probably will disappear entirely sometime in MySQL 5.0.

Due to the decline in both the level of support for the ISAM engine and its use in the field, ISAM is not covered very much elsewhere in this book. In most cases, it is mentioned only to point out how it embodied a restriction that has been lifted in the MyISAM storage engine.

The MyISAM Storage Engine

The MyISAM storage engine is the default engine in MySQL as of version 3.23, unless you have configured your server otherwise.

The MERGE Storage Engine

MERGE tables are a means for grouping a set of MyISAM tables into a single logical unit. By querying a MERGE table, you in effect query all the constituent tables. One advantage of this is that you can in effect exceed the maximum table size allowed by the filesystem for individual MyISAM tables.

The tables that make up a MERGE table must all have the same structure. This means the columns in each table must be defined with the same names and types in the same order, and the indexes must be defined in the same way and in the same order. It is allowable to mix compressed and uncompressed tables. (Compressed tables are produced with myisampack; see Appendix F, "MySQL Program Reference.")

The MEMORY Storage Engine

The MEMORY storage engine uses tables that are stored in memory and that have fixed-length rows, two characteristics that make them very fast. The MEMORY storage engine originally was called the HEAP engine; you will need to use HEAP if your server is older and does not recognize the MEMORY keyword.

MEMORY tables are temporary in the sense that their contents disappear when the server terminates. That is, MEMORY tables still exist when the server restarts, but will be empty. However, in contrast to temporary tables created with CREATE TEMPORARY TABLE, MEMORY tables are visible to other clients. Several constraints apply to MEMORY tables that allow them to be handled more simply, and thus more quickly:

MEMORY tables can use different character sets for different columns.

The InnoDB Storage Engine

The InnoDB storage engine was developed by Innobase Oy. InnoDB offers these features:

The BDB Storage Engine

The Berkeley DB storage engine was developed by Sleepycat Software. You may have encountered it in other applications. For example, the Subversion revision control system uses BDB to provide backing store for repositories. In MySQL, the BDB storage engine offers these features:

The FEDERATED Storage Engine

The FEDERATED storage engine provides access to tables that are located at other MySQL servers. In other words, the contents of a FEDERATED table really are located remotely. When you create a FEDERATED table, you specify the host where the other server is running and provide the username and password of an account on that server. When you access the FEDERATED table, the local server connects to the remote server using this account. For an example, see "Using FEDERATED Tables."

The NDB Storage Engine

NDB is MySQL's cluster storage engine. It was developed by Ericsson Business Innovation and later acquired by MySQL AB. For this storage engine, the MySQL server actually acts as a client to a cluster of other processes that provide access to the NDB tables. Cluster node processes communicate with each other to manage tables in memory. The tables are replicated among cluster processes for redundancy. Memory storage provides high performance, and the cluster provides high availability because it survives failure of any given node.

NDB is relatively new in MySQL and configuration management is still being worked out. For this reason, NDB is not covered further here. See the MySQL Reference Manual for current details.

Other Storage Engines

There are three other MySQL storage engines that I will group here under the "miscellaneous" category:

Storage Engine Portability Characteristics

Any table managed by a given MySQL server is portable to another server in the sense that you can dump it into a text file with mysqldump, move the file to the machine where the other server runs, and load the file to re-create the table. Another kind of portability is "binary portability," which means that you can directly copy the disk files that represent the table to another machine, install them into a database directory, and expect the MySQL server there to be able to use the table.

A general condition for binary portability is that the destination server must support the storage engine that manages the table. If the server does not have the appropriate engine, it cannot access tables created by that engine.

Some storage engines create tables that are binary portable and some do not. The following list characterizes binary portability for individual engines:

Regardless of a storage engine's general portability characteristics, normally you should not attempt to copy table or tablespace files to another machine unless the server has been shut down cleanly. If you perform a copy after an abnormal shutdown, you cannot assume the integrity of your tables. The tables may be in need of repair or there may be transaction information still stored in a storage engine's log files that needs to be applied or rolled back to bring tables up to date.

It is sometimes possible to tell a running server to leave tables alone while you copy them. However, if the server is running and actively updating the tables, the table contents on disk will be in flux and the associated files will not yield usable table copies. See Chapter 13, "Database Backups, Maintenance, and Repair," for discussion of the conditions under which you can leave the server running while copying tables.

Creating Tables

To create a table, use a CREATE TABLE statement. The full syntax for this statement is complex because there are so many optional clauses, but in practice, it's usually fairly simple to use. For example, most of the CREATE TABLE statements that we used in Chapter 1 are reasonably uncomplicated. If you start with the more basic forms and work up, you shouldn't have much trouble.

The CREATE TABLE specifies, at a minimum, the table name and a list of the columns in it. For example:

CREATE TABLE mytbl
(
  name   CHAR(20),
  age    INT NOT NULL,
  weight INT,
  sex    ENUM('F','M')
);

In addition to the column definitions, you can specify how the table should be indexed when you create it. Another option is to leave the table unindexed when you create it and add the indexes later. For MyISAM tables, that's a good strategy if you plan to populate the table with a lot of data before you begin using it for queries. Updating indexes as you insert each row is much slower for those table types than loading the data into an unindexed table and creating the indexes afterward.

We have already covered the basic syntax for the CREATE TABLE statement in Chapter 1. Details on how to write column definitions are given in Chapter 3. Here, we deal more generally with some important extensions to the CREATE TABLE statement that give you a lot of flexibility in how you construct tables:

Table Options

To modify a table's storage characteristics, you can add table options following the closing parenthesis in the CREATE TABLE statement. For example, you can add an ENGINE = engine_name option to specify which storage engine to use for the table. The engine name is not case sensitive. To create a MEMORY or InnoDB table, write the statement like this:

CREATE TABLE mytbl ( ... ) ENGINE = MEMORY;
CREATE TABLE mytbl ( ... ) ENGINE = InnoDB;

TYPE can be used as a synonym for the ENGINE keyword, but you will get a warning in MySQL 4.1 and up. (For older servers that do not understand ENGINE, you must use TYPE.)

With no ENGINE specifier, the server creates the table using the default storage engine. The built-in default is MyISAM, but you can configure the server to use a different default by starting it with the --default-storage-engine option. At runtime, you can change the default storage engine by setting the storage_engine system variable.

If a CREATE TABLE statement names a storage engine that is legal but unavailable, MySQL creates the table using the default engine and generates a warning. For example, if BDB is not available, you would see something like this if you try to create a BDB table:

mysql> CREATE TABLE t (i INT) ENGINE = BDB;
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> SHOW WARNINGS;
+---------+------+-------------------------------------------+
| Level   | Code | Message                                   |
+---------+------+-------------------------------------------+
| Warning | 1266 | Using storage engine MyISAM for table 't' |
+---------+------+-------------------------------------------+
1 row in set (0.00 sec)

If you name an unknown storage engine, an error occurs.

When you want to make sure that a table uses a particular storage engine, be sure to include the ENGINE table option. Because the default engine is configurable, you might not get the type of table that you want if you omit ENGINE.

To check which storage engine a table uses, issue a SHOW CREATE TABLE or SHOW TABLE STATUS statement:

mysql> SHOW CREATE TABLE t\G
*************************** 1. row ***************************
    Table: t
Create Table: CREATE TABLE ´t´ (
 ´i´ int(11) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Multiple table options can be given in the CREATE TABLE statement. Many of them apply only to particular storage engines. For example, a MIN_ROWS = n option can be useful for MEMORY tables to allow the MEMORY storage engine to optimize memory usage:

CREATE TABLE mytbl ( ... ) ENGINE = MEMORY MIN_ROWS = 10000;

If the MEMORY engine considers the value of MIN_ROWS to be large, it may allocate memory in larger hunks to avoid the overhead of making many allocation calls.

The MAX_ROWS and AVG_ROW_LENGTH options can help you size a MyISAM table. By default, MyISAM creates tables with an internal row pointer size that allows table files to grow up to 4GB. Specifying the MAX_ROWS and AVG_ROW_LENGTH options gives MyISAM information that it should use a pointer size for a table that can hold at least MAX_ROWS rows.

A complete list of table options is given in the description for CREATE TABLE in Appendix E.

To modify the storage characteristics of an existing table, table options can be used with an ALTER TABLE statement. For example, to change mytbl from its current storage engine to InnoDB, do this:

ALTER TABLE mytbl ENGINE = InnoDB;

See "Altering Table Structure" for more information about changing storage engines.

Provisional Table Creation

To create a table only if it doesn't already exist, use CREATE TABLE IF NOT EXISTS. You can use this statement for an application that makes no assumptions about whether a table that it needs has been set up in advance. The application can go ahead and attempt to create the table as a matter of course. The IF NOT EXISTS modifier is particularly useful for scripts that you run as batch jobs with mysql. In this context, a regular CREATE TABLE statement doesn't work very well. The first time the job runs, it creates the table, but the second time an error occurs because the table already exists. If you use IF NOT EXISTS, there is no problem. The first time the job runs, it creates the table, as before. For second and subsequent times, table creation attempts are silently ignored without error. This allows the job to continue processing as if the attempt had succeeded.

If you use IF NOT EXISTS, be aware that MySQL does not compare the table structure in the CREATE TABLE statement with that of the existing table. If a table exists with the given name but has a different structure, the statement does not fail. If that is a risk you do not want to take, it might be better instead to use DROP TABLE IF NOT EXISTS followed by CREATE TABLE without IF EXISTS.

Temporary Tables

You can use CREATE TEMPORARY TABLE to create temporary tables that disappear automatically when your connection to the server terminates. This is handy because you don't have to bother issuing a DROP TABLE statement to get rid of the table, and the table doesn't hang around if your connection terminates abnormally. For example, if you have a canned query in a batch file that you run with mysql and you decide not to wait for it to finish, you can kill the script in the middle with impunity and the server will remove any TEMPORARY tables that the script creates.

A TEMPORARY table is visible only to the client that creates the table. Different clients each can create a TEMPORARY table with the same name. The tables do not conflict because each client sees only the table that it created.

The name of a TEMPORARY table can be the same as that of an existing permanent table. This is not an error, nor does the existing permanent table get clobbered. Instead, the permanent table becomes hidden (inaccessible) to the client that creates the TEMPORARY table while the TEMPORARY table exists. Suppose that you create a TEMPORARY table named member in the sampdb database. The original member table becomes hidden, and references to member refer to the TEMPORARY table. If you issue a DROP TABLE member statement, the TEMPORARY table is removed and the original member table "reappears." If you simply disconnect from the server without dropping the TEMPORARY table, the server automatically drops it for you. The next time you connect, the original member table is visible again. (The original table also reappears if you rename a TEMPORARY table that hides it to have a different name.)

The name-hiding mechanism works only to one level. That is, you cannot create two TEMPORARY tables with the same name.

To create a TEMPORARY table using a particular storage engine, add an ENGINE table option to the CREATE TABLE statement.

The server drops a TEMPORARY table automatically when your client session ends, but you drop it explicitly as soon as you're done with it to allow the server to free any resources associated with it. This is a good idea if your session with the server will not end for a while, particularly for temporary MEMORY tables.

Keep in mind the following caveats when considering whether to use a TEMPORARY table:

Creating Tables from Other Tables or Query Results

It's sometimes useful to create a copy of a table. For example, you might have a data file that you want to load into a table using LOAD DATA, but you're not quite sure about the options for specifying the data format. You can end up with malformed records in the original table if you don't get the options right the first time. Using an empty copy of the original table allows you to experiment with the LOAD DATA options for specifying column and line delimiters until you're satisfied your input records are being interpreted properly. Then you can load the file into the original table by rerunning the LOAD DATA statement with the original table name.

It's also sometimes desirable to save the result of a query into a table rather than watching it scroll off the top of your screen. By saving the result, you can refer to it later without rerunning the original query—perhaps to perform further analysis on it.

MySQL provides two statements for creating new tables from other tables or from query results. These statements have differing advantages and disadvantages:

To use CREATE TABLE ... LIKE for creating an empty copy of an existing table, write a statement like this:

CREATE TABLE new_tbl_name LIKE tbl_name;

To create an empty copy of a table and then populate it from the original table, use CREATE TABLE ... LIKE followed by INSERT INTO ... SELECT:

CREATE TABLE new_tbl_name LIKE tbl_name;
INSERT INTO new_tbl_name SELECT * FROM tbl_name;

To create a table as a temporary copy of itself, add the TEMPORARY keyword:

CREATE TEMPORARY TABLE new_tbl_name LIKE tbl_name;
INSERT INTO new_tbl_name SELECT * FROM tbl_name;

Using a TEMPORARY table with the same name as the original can be useful when you want to try some statements that modify the contents of the table, but you don't want to change the original table. To use prewritten scripts that use the original table name, you don't need to edit them to refer to a different table; just add the CREATE TEMPORARY TABLE and INSERT statements to the beginning of the script. The script will create a temporary copy and operate on the copy, which the server deletes when the script finishes. (However, bear in mind the auto-reconnect caveat noted earlier in "Temporary Tables.")

To insert into the new table only some of the rows from the original table, add a WHERE clause that identifies which rows to select. The following statements create a new table named student_f that contains only the records for female students in the student table:

CREATE TABLE student_f LIKE student;
INSERT INTO student_f SELECT * FROM student WHERE sex = 'f';

If you don't care about retaining the exact column definitions from the original table, CREATE TABLE ... SELECT sometimes is easier to use than CREATE TABLE ... LIKE because it can create and populate the new table in a single statement:

CREATE TABLE student_f SELECT * FROM student WHERE sex = 'f';

CREATE TABLE ... SELECT also can create new tables that don't contain exactly the same set of columns in an existing table. You can use it to cause a new table to spring into existence on-the-fly to hold the result of an arbitrary SELECT query. This makes it exceptionally easy to create a table fully populated with the data in which you're interested, ready to be used in further statements. However, the new table can contain strange column names if you're not careful. To avoid this, you should use aliases as necessary to provide reasonable column names. When you create a table by selecting data into it, the column names are taken from the columns that you are selecting. If a column is calculated as the result of an expression, the name of the column is the text of the expression, which creates a table with an unusual column name:

mysql> CREATE TABLE mytbl SELECT PI() * 2;
mysql> SELECT * FROM mytbl;
+----------+
| PI() * 2 |
+----------+
| 6.283185 |
+----------+

That's unfortunate, because the column name can be referred to directly only as a quoted identifier:

mysql> SELECT ´PI() * 2´ FROM mytbl;
+----------+
| PI() * 2 |
+----------+
| 6.283185 |
+----------+

To provide a column name that is easier to work with, use an alias:

mysql> DROP TABLE mytbl;
mysql> CREATE TABLE mytbl SELECT PI() * 2 AS mycol;
mysql> SELECT mycol FROM mytbl;
+----------+
| mycol    |
+----------+
| 6.283185 |
+----------+

A related snag occurs if you select from different tables columns that have the same name. Suppose that tables t1 and t2 both have a column c and you want to create a table from all combinations of rows in both tables. The following statement fails because it attempts to create a table with two columns named c:

mysql> CREATE TABLE t3 SELECT * FROM t1, t2;
ERROR 1060 (42S21): Duplicate column name 'c'

To solve this problem, provide aliases that give each column a unique name in the new table:

mysql> CREATE TABLE t3 SELECT t1.c AS c1, t2.c AS c2 FROM t1, t2;

As mentioned previously, a shortcoming of CREATE TABLE ... SELECT is that not all characteristics of the original data are incorporated into the structure of the new table. For example, creating a table by selecting data into it does not copy indexes from the original table, and it can lose column attributes such as the default value. In some cases, you can force specific attributes to be used in the new table by invoking the CAST() function in the SELECT part of the statement. The following CREATE TABLE ... SELECT statement forces the columns produced by the SELECT to be treated as INT UNSIGNED, DATE, and BINARY. You can verify that with DESCRIBE:

mysql> CREATE TABLE mytbl SELECT
    -> CAST(1 AS UNSIGNED) AS i,
    -> CAST(CURDATE() AS DATE) AS d,
    -> CAST('Hello, world' AS BINARY) AS c;
mysql> DESCRIBE mytbl;
+-------+-----------------+------+-----+---------+-------+
| Field | Type            | Null | Key | Default | Extra |
+-------+-----------------+------+-----+---------+-------+
| i     | int(1) unsigned |      |     | 0       |       |
| d     | date            | YES  |     | NULL    |       |
| c     | binary(12)      |      |     |         |       |
+-------+-----------------+------+-----+---------+-------+

The allowable cast types are BINARY (binary string), DATE, DATETIME, TIME, SIGNED, SIGNED INTEGER, UNSIGNED, and UNSIGNED INTEGER.

It is also possible to provide explicit column definitions in the CREATE TABLE part, to be used for the columns retrieved by the SELECT part. Columns in the two parts are matched by name, so provide aliases in the SELECT part as necessary to cause them to match up properly:

mysql> CREATE TABLE mytbl (i INT UNSIGNED, d DATE, c BINARY(20))
    -> SELECT
    -> 1 AS i,
    -> CURDATE() AS d,
    -> 'Hello, world' AS c;
mysql> DESCRIBE mytbl;
+-------+------------------+------+-----+---------+-------+
| Field | Type             | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+-------+
| i     | int(10) unsigned | YES  |     | NULL    |       |
| d     | date             | YES  |     | NULL    |       |
| c     | binary(20)       | YES  |     | NULL    |       |
+-------+------------------+------+-----+---------+-------+

The technique of providing explicit definitions allows you to create character columns that have a different width than that of the longest value in the result set. Also note that the Null and Default attributes of some of the columns are different for this example than for the previous one. You can provide explicit definitions for those attributes as well if necessary.

Using MERGE Tables

The MERGE storage engine provides a way to perform queries on a set of MyISAM tables simultaneously by treating them all as a single logical unit. As described earlier in "Storage Engine Characteristics," MERGE can be applied to a collection of MyISAM tables that all have identical structure. The columns in each table must be defined with the same names and types in the same order, and the indexes must be defined in the same way and in the same order.

Suppose that you have a set of individual log tables that contain log entries on a year-by-year basis and that each are defined like this, where CC and YY represent the century and year:

CREATE TABLE log_CCYY
(
  dt   DATETIME NOT NULL,
  info VARCHAR(100) NOT NULL,
  INDEX (dt)
) ENGINE = MyISAM;

If the current set of log tables includes log_2001, log_2002, log_2003, log_2004, and log_2005, you can set up a MERGE table that maps onto them like this:

CREATE TABLE log_all
(
  dt   DATETIME NOT NULL,
  info VARCHAR(100) NOT NULL,
  INDEX (dt)
) ENGINE = MERGE UNION = (log_2001, log_2002, log_2003, log_2004, log_2005);

The ENGINE value must be MERGE, and the UNION option lists the tables to be included in the MERGE table. After the table has been set up, you query it just like any other table, but the queries will refer to all the constituent tables at once. This query determines the total number of rows in all the log tables:

SELECT COUNT(*) FROM log_all;

This query determines how many log entries there are per year:

SELECT YEAR(dt) AS y, COUNT(*) AS entries FROM log_all GROUP BY y;

Besides the convenience of being able to refer to multiple tables without issuing multiple queries, MERGE tables offer some other nice features:

MERGE tables also support DELETE and UPDATE operations. INSERT is trickier because MySQL needs to know which table to insert new records into. MERGE table definitions can include an INSERT_METHOD option with a value of NO, FIRST, or LAST to indicate that INSERT is forbidden or that records should be inserted into the first or last table named in the UNION option. For example, the following definition would cause an INSERT into log_all to be treated as an INSERT into log_2005, the last table named in the UNION option:

CREATE TABLE log_all
(
  dt   DATETIME NOT NULL,
  info VARCHAR(100) NOT NULL,
  INDEX (dt)
) ENGINE = MERGE UNION = (log_2001, log_2002, log_2003, log_2004, log_2005)
INSERT_METHOD = LAST;

Using FEDERATED Tables

The FEDERATED storage engine is available as of MySQL 5.0.3. It enables you to access tables from one MySQL server that actually are managed by another server. This section briefly summarizes how to use this storage engine.

Suppose that there is no sampdb database on your local server, but there is one available from the MySQL server running on the host corn.snake.net and that you have an account for accessing that server. This account also can be used by the local server through the FEDERATED storage engine to make the sampdb tables available on the local server. For each table that you want to access this way, create a FEDERATED table that has the same columns as the remote table, but include a connection string that indicates to the local server how to connect to the remote server. This is done with the COMMENT table option.

For example, the student table on the remote server has this definition:

CREATE TABLE student
(
  name    VARCHAR(20) NOT NULL,
  sex     ENUM('F','M') NOT NULL,
  student_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (student_id)
) ENGINE = InnoDB;

To create a FEDERATED table, use the same definition except that the ENGINE option should be FEDERATED and a COMMENT option should be given that provides connection information. The following definition creates a table named federated_student that accesses the student table on corn.snake.net:

CREATE TABLE federated_student
(
  name    VARCHAR(20) NOT NULL,
  sex     ENUM('F','M') NOT NULL,
  student_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (student_id)
) ENGINE = FEDERATED
COMMENT = 'mysql://sampadm:secret@corn.snake.net/sampdb/student';

The connection string in the COMMENT value indicates that the username and password of the MySQL account on the remote server are sampadm and secret. The general connection string syntax is as follows, where square brackets indicate optional information:

mysql://user_name[:password]@host_name[:port_num]/db_name/tbl_name

After you create the federated_student table, you can select from it to access the remote student table. You can also use INSERT, UPDATE, and DELETE with federated_student to modify the contents of the student table.

The FEDERATED engine is quite new, so some details are likely to change. For example, an alternative to the COMMENT option for storing the connection string might be implemented to prevent the name and password from being visible to anyone who can use SHOW CREATE TABLE for a FEDERATED table.

Dropping Tables

Dropping a table is much easier than creating it because you don't have to specify anything about its contents. You just have to name it:

DROP TABLE tbl_name;

MySQL extends the DROP TABLE statement in some useful ways:

Indexing Tables

Indexes are the primary means of speeding up access to the contents of your tables, particularly for queries that involve joins on multiple tables. This is an important enough topic that Chapter 4, "Query Optimization," discusses why you use indexes, how they work, and how best to take advantage of them to optimize your queries. This section covers the characteristics of indexes for the various table types and the syntax for creating and dropping indexes.

Storage Engine Indexing Characteristics

MySQL provides quite a bit of flexibility in the way you can construct indexes:

Not all storage engines offer all indexing features. The following table summarizes the indexing properties of the various table types. The table does not include the MERGE storage engine, because MERGE tables are created from MyISAM tables and have similar indexing characteristics. Nor does it include the EXAMPLE, ARCHIVE, or CSV engines, which do not support indexing.

Index Characteristic

ISAM

MyISAM

MEMORY

BDB

InnoDB

NULL values allowed

No

Yes

Yes

Yes

Yes

Columns per index

16

16

16

16

16

Indexes per table

16

32/64

32/64

31/64

32/64

Maximum index row size (bytes)

256

1000

1024

1024

1024

Index column prefixes

Yes

Yes

Yes

Yes

Yes

BLOB/TEXT indexes

No

Yes

No

Yes

Yes

FULLTEXT indexes

No

Yes

No

No

No

SPATIAL indexes

No

Yes

No

No

No

HASH indexes

No

No

Yes

No

No

Foreign keys

No

No

No

No

Yes


When there are two values shown in the "Indexes per table" row in the table, the first value applies through MySQL 4.1.1, the second from 4.1.2 and up.

The table illustrates some of the reasons why the MyISAM storage engine generally is to be preferred over the ISAM engine that it succeeds. MyISAM relaxes several of the indexing constraints that apply to ISAM tables. For example, with MyISAM tables, you can index columns that contain NULL values, you can index BLOB and TEXT columns, and you can have a larger number of indexes per table.

One implication of the differences in indexing characteristics for the various storage engines is that if you require an index to have certain properties, you may not be able to use certain types of tables. For example, if you want to use a FULLTEXT index, you must use a MyISAM table. If you want to use foreign keys, you must use an InnoDB table.

If you have an existing table of one type but would like to convert it to another type that has more suitable indexing characteristics, use ALTER TABLE to change the type. Suppose that you were using ISAM tables in an older version of MySQL but now have upgraded to a newer version. To take advantage of MyISAM's superior indexing features, you can easily convert each table to use the MyISAM storage engine with ALTER TABLE:

ALTER TABLE tbl_name ENGINE = MyISAM;

To use the transactional capabilities offered by InnoDB or BDB, you can convert a table like this:

ALTER TABLE tbl_name ENGINE = InnoDB;
ALTER TABLE tbl_name ENGINE = BDB;

Creating Indexes

MySQL can create several types of indexes:

You can create indexes for a new table when you use CREATE TABLE. Examples of this are shown in Chapter 1. To add indexes to existing tables, use CREATE INDEX or ALTER TABLE. MySQL maps CREATE INDEX statements onto ALTER TABLE operations internally.

ALTER TABLE is the more versatile than CREATE INDEX because it can create any kind of index supported by MySQL. For example:

ALTER TABLE tbl_name ADD INDEX index_name (index_columns);
ALTER TABLE tbl_name ADD UNIQUE index_name (index_columns);
ALTER TABLE tbl_name ADD PRIMARY KEY (index_columns);
ALTER TABLE tbl_name ADD FULLTEXT index_name (index_columns);
ALTER TABLE tbl_name ADD SPATIAL index_name (index_columns);

tbl_name is the name of the table to which the index should be added, and index_columns indicates which column or columns to index. If the index consists of more than one column, separate the names by commas. The index name index_name is optional. If you leave it out, MySQL picks a name based on the name of the first indexed column.

Indexed columns can be NULL unless the index is a PRIMARY KEY or SPATIAL index.

A single ALTER TABLE statement can include multiple table alterations if you separate them by commas. This means that you can create several indexes at the same time, which is faster than adding them one at a time with individual ALTER TABLE statements.

To place the constraint on an index that it contain only unique values, create the index as a PRIMARY KEY or a UNIQUE index. The two types of index are very similar, but have two differences:

CREATE INDEX can add most types of indexes, with the exception of a PRIMARY KEY:

CREATE INDEX index_name ON tbl_name (index_columns);
CREATE UNIQUE INDEX index_name ON tbl_name (index_columns);
CREATE FULLTEXT INDEX index_name ON tbl_name (index_columns);
CREATE SPATIAL INDEX index_name ON tbl_name (index_columns);

tbl_name, index_name, and index_columns have the same meaning as for ALTER TABLE. Unlike ALTER TABLE, the index name is not optional with CREATE INDEX, and you cannot create multiple indexes with a single statement.

To create indexes for a new table when you issue a CREATE TABLE statement, the syntax is similar to that used for ALTER TABLE, but you specify the index-creation clauses in addition to the column definitions:

CREATE TABLE tbl_name
(
  ... column definitions ...
  INDEX index_name (index_columns),
  UNIQUE index_name (index_columns),
  PRIMARY KEY (index_columns),
  FULLTEXT index_name (index_columns),
  SPATIAL index_name (index_columns),
  ...
);

As with ALTER TABLE, index_name is optional. MySQL picks an index name if you leave it out.

As a special case, you can create a single-column PRIMARY KEY or UNIQUE index by adding a PRIMARY KEY or UNIQUE clause to the end of a column definition. For example, the following CREATE TABLE statements are equivalent:

CREATE TABLE mytbl
(
  i INT NOT NULL PRIMARY KEY,
  j CHAR(10) NOT NULL UNIQUE
);

CREATE TABLE mytbl
(
  i INT NOT NULL,
  j CHAR(10) NOT NULL,
  PRIMARY KEY (i),
  UNIQUE (j)
);

The default index type for a MEMORY table is HASH. A hashed index is very fast for exact-value lookups, which is the typical way MEMORY tables are used. However, if you plan to use a MEMORY table for comparisons that can match a range of values (for example, id < 100), hashed indexes do not work well. In this case, you'll be better off creating a BTREE index instead. Do this by adding a USING clause to the index definition:

CREATE TABLE namelist
(
  id  INT NOT NULL,
  name CHAR(100),
  INDEX USING BTREE (id)
) ENGINE = MEMORY;

To index a prefix of a string column, the syntax for naming the column in the index definition is col_name(n) rather than simply col_name. The prefix value, n, can be from 1 to 255. (1 to 1000 for MyISAM and InnoDB as of MySQL 4.1.2.) This indicates that the index should include the first n bytes of column values for binary string types, or the first n characters for non-binary string types. For example, the following statement creates a table with a CHAR column and a BINARY column. It indexes the first 10 characters of the CHAR column and the first 15 bytes of the BINARY column:

CREATE TABLE mytbl
(
  name  CHAR(30) NOT NULL,
  address BINARY(60) NOT NULL,
  INDEX (name(10)),
  INDEX (address(15))
);

Index prefixes are supported for MyISAM, MERGE, MEMORY, BDB, and InnoDB tables.

When you index a prefix of a string column, the prefix length, just like the column length, is specified in the same units as the column data type—that is, bytes for binary strings and characters for non-binary strings. However, the maximum size of index entries are measured internally in bytes. The two measures are the same for single-byte character sets, but not for multi-byte character sets. For non-binary strings that have multi-byte character sets, MySQL stores into index values as many complete characters that fit within the allowed maximum byte length.

In some circumstances, you may find it not only desirable but necessary to index a column prefix rather than the entire column:

Columns in FULLTEXT indexes are indexed in full and do not have prefixes. If you specify a prefix length for a column in a FULLTEXT index, it is ignored.

Dropping Indexes

To drop an index, use either a DROP INDEX or an ALTER TABLE statement. To use DROP INDEX, you must name the index to be dropped. (To drop a PRIMARY KEY with DROP INDEX, use the quoted identifier ´PRIMARY´.) The syntax for DROP INDEX looks like this:

DROP INDEX index_name ON tbl_name;
DROP INDEX ´PRIMARY´ ON tbl_name;

The second statement is unambiguous because a table may have only one PRIMARY KEY and its name is always PRIMARY.

Like the CREATE INDEX statement, DROP INDEX is handled internally as an ALTER TABLE statement. The syntax for ALTER TABLE statements that correspond to the preceding DROP INDEX statements is as follows:

ALTER TABLE tbl_name DROP INDEX index_name;
ALTER TABLE tbl_name DROP PRIMARY KEY;

If you don't know the names of a table's indexes, use SHOW CREATE TABLE or SHOW INDEX to find out.

When you drop columns from a table, indexes can be affected implicitly. If you drop a column that is a part of an index, the column is removed from the index as well. If all columns that make up an index are dropped, the entire index is dropped.

Altering Table Structure

ALTER TABLE is a versatile statement in MySQL, and you can use it for many purposes. We've already seen some of its capabilities in this chapter (for changing storage engines and for creating and dropping indexes). You can also use ALTER TABLE to rename tables, add or drop columns, change column data types, and more. In this section, we'll cover some of its features. The complete syntax for ALTER TABLE is described in Appendix E.

ALTER TABLE is useful when you find that the structure of a table no longer reflects what you want to do with it. You might want to use the table to record additional information, or perhaps it contains information that has become superfluous. Maybe existing columns are too small. Perhaps it turns out that you've defined them larger than you need and you'd like to make them smaller to save space and improve query performance. Here are some situations in which ALTER TABLE is valuable:

The syntax for ALTER TABLE looks like this:

ALTER TABLE tbl_name action [, action] ... ;

Each action specifies a modification that you want to make to the table. Some database systems allow only a single action in an ALTER TABLE statement. MySQL allows multiple actions, separated by commas.

Tip: If you need to remind yourself about a table's current definition before using ALTER TABLE, issue a SHOW CREATE TABLE statement. This statement also can be useful after ALTER TABLE to see how the alteration affected the table definition.

The following examples show some of the capabilities of ALTER TABLE.

Changing a column's data type. To change a data type, you can use either a CHANGE or MODIFY clause. Suppose that the column in a table mytbl is SMALLINT UNSIGNED and you want to change it to MEDIUMINT UNSIGNED. Do so using either of the following commands:

ALTER TABLE mytbl MODIFY i MEDIUMINT UNSIGNED;
ALTER TABLE mytbl CHANGE i i MEDIUMINT UNSIGNED;

Why is the column named twice in the command that uses CHANGE? Because one thing that CHANGE can do that MODIFY cannot is to rename the column in addition to changing the type. If you had wanted to rename i to j at the same time you changed the type, you'd do so like this:

ALTER TABLE mytbl CHANGE i j MEDIUMINT UNSIGNED;

The important thing with CHANGE is that you name the column you want to change and then specify a complete column definition, which includes the column name. That is, you must include the name in the definition, even if it's the same as the old name.

To rename a column, use CHANGE old_name new_name followed by the column's current definition.

You can assign character sets to individual columns, so it's possible to use the CHARACTER SET attribute in a column's definition to change its character set:

ALTER TABLE t MODIFY c CHAR(20) CHARACTER SET ucs2;

An important reason for changing data types is to improve query efficiency for joins that compare columns from two tables. Indexes often can be used for comparisons in joins between similar column types, but comparisons are quicker when both columns are exactly the same type. Suppose that you're running a query like this:

SELECT ... FROM t1, t2 WHERE t1.name = t2.name;

If t1.name is CHAR(10) and t2.name is CHAR(15), the query won't run as quickly as if they were both CHAR(15). You can make them the same by changing t1.name using either of these commands:

ALTER TABLE t1 MODIFY name CHAR(15);
ALTER TABLE t1 CHANGE name name CHAR(15);

Converting a table to use a different storage engine. To convert a table from one storage engine to another, use an ENGINE clause that specifies the new engine name:

ALTER TABLE tbl_name ENGINE = engine_name;

engine_name is a name such as MyISAM, MEMORY, BDB, or InnoDB. Lettercase of the name does not matter.

Changing storage engines can be useful when you upgrade your MySQL installation to a newer version that provides additional table-handling features. For example, if you inherit an old pre-3.23 database, its tables will be in ISAM format. To change them to MyISAM tables, use this statement for each one:

ALTER TABLE tbl_name ENGINE = MyISAM;

Doing this allows you to take advantages of the capabilities that MyISAM offers that ISAM does not. For example, MyISAM tables are binary portable, so you can move them to other machines by copying table files directly, even if the machines have different hardware architectures. In addition, MyISAM tables have better indexing characteristics than ISAM.

Another reason to change a storage engine is to make it transaction-safe. Suppose that you have a MyISAM table and discover that an application that uses it needs to perform transactional operations, including rollback in case failures occur. MyISAM tables do not support transactions, but you can make the table transaction-safe by converting it to an InnoDB or BDB table:

ALTER TABLE tbl_name ENGINE = InnoDB;
ALTER TABLE tbl_name ENGINE = BDB;

When you convert a table to use a different engine, the allowable or sensible conversions may depend on the feature compatibility of the old and new types:

There are circumstances under which you should not use ALTER TABLE to convert a table to use a different storage engine:

Renaming a table. Use a RENAME clause that specifies the new table name:

ALTER TABLE tbl_name RENAME TO new_tbl_name;

Another way to rename tables is with RENAME TABLE. The syntax looks like this:

RENAME TABLE old_name TO new_name;

One thing that RENAME TABLE can do that ALTER TABLE cannot is rename multiple tables in the same statement. For example, you can swap the names of two tables like this:

RENAME TABLE t1 TO tmp, t2 TO t1, tmp TO t2;

If you qualify a table name with a database name, you can move a table from one database to another by renaming it. Either of the following statements move the table t from the sampdb database to the test database:

ALTER TABLE sampdb.t RENAME TO test.t;
RENAME TABLE sampdb.t TO test.t;

You cannot rename a table to a name that already exists.

If you rename a MyISAM table that is part of a MERGE table, you must redefine the MERGE table accordingly.

Getting Information About Databases and Tables

MySQL provides a SHOW statement that has several variant forms that display information about databases and the tables in them. SHOW is helpful for keeping track of the contents of your databases and for reminding yourself about the structure of your tables:

Several forms of the SHOW statement take a LIKE 'pattern' clause allowing a pattern to be given that limits the scope of the output. MySQL interprets 'pattern' as an SQL pattern that may include the '%' and '_' wildcard characters. For example, this statement displays the names of tables in the current database that begin with 'geo':

SHOW TABLES LIKE 'geo%';

To match a literal instance of a wildcard character in a LIKE pattern, precede it with a backslash. Generally, this is done to match a literal '_', which occurs frequently in database, table, and column names.

The mysqlshow command provides some of the same information as the SHOW statement, which allows you to get database and table information at your command prompt:

The mysqldump client program allows you to see the structure of your tables in the form of a CREATE TABLE statement (much like SHOW CREATE TABLE). When using mysqldump to review table structure, be sure to invoke it with the --no-data option so that you don't get swamped with your table's data!

% mysqldump --no-data db_name [tbl_name] ...

If you list only the database name without any table names, mysqldump displays the structure for all tables in the database. Otherwise it shows information for the named tables.

For both mysqlshow and mysqldump, you can specify the usual connection parameter options, such as --host, --user, or --password.

It's sometimes useful to be able to tell from within an application whether a given table exists. You can use SHOW TABLES to find out (but remember that SHOW TABLES does not list TEMPORARY tables):

SHOW TABLES LIKE 'tbl_name';
SHOW TABLES FROM db_name LIKE 'tbl_name';

If the SHOW TABLES statement lists information for the table, it exists. It's also possible to determine table existence, even for TEMPORARY tables, with either of the following statements:

SELECT COUNT(*) FROM tbl_name;
SELECT * FROM tbl_name WHERE 0;

Each statement succeeds if the table exists, and fails if it doesn't. The first statement is most appropriate for MyISAM tables, for which COUNT(*) with no WHERE clause is highly optimized. It's not so good for InnoDB or BDB tables, which require a full scan to count the rows. The second statement is more general because it runs quickly for any storage engine. These statements are most suitable for use within application programming languages such as Perl or PHP because you can test the success or failure of the query and take action accordingly. They're not especially useful in a batch script that you run from mysql because you can't do anything if an error occurs except terminate (or ignore the error, but then there's obviously no point in running the query at all).

To determine the storage engine for individual tables, you can use SHOW TABLE STATUS or SHOW CREATE TABLE. The output from either statement includes a storage engine indicator.

Beginning with version 5.0.2, MySQL implements INFORMATION_SCHEMA, which provides another way to obtain information about databases (that is, database metadata). INFORMATION_SCHEMA is based on the SQL standard. That is, the access mechanism is standard, even though some of the content is MySQL-specific. This makes INFORMATION_ SCHEMA more portable than the various SHOW statements, which are entirely MySQL-specific. INFORMATION_SCHEMA is accessed through SELECT statements and can be used in a flexible manner. SHOW statements always display a fixed set of columns and you cannot capture the output in a table. With INFORMATION_SCHEMA, the SELECT statement can name specific output columns and a WHERE clause to specify exactly what information you require. Also, you can use joins or subqueries, and you can save the result of the retrieval in another table for further processing.

You can think of INFORMATION_SCHEMA as a virtual database where the various tables within it are views for different kinds of database metadata. To see what tables INFORMATION_SCHEMA contains, use SHOW TABLES:

mysql> SHOW TABLES IN INFORMATION_SCHEMA;
+---------------------------------------+
| Tables_in_information_schema          |
+---------------------------------------+
| SCHEMATA                              |
| TABLES                                |
| COLUMNS                               |
| CHARACTER_SETS                        |
| COLLATIONS                            |
| COLLATION_CHARACTER_SET_APPLICABILITY |
| ROUTINES                              |
| STATISTICS                            |
| VIEWS                                 |
| USER_PRIVILEGES                       |
| SCHEMA_PRIVILEGES                     |
| TABLE_PRIVILEGES                      |
| COLUMN_PRIVILEGES                     |
| TABLE_CONSTRAINTS                     |
| KEY_COLUMN_USAGE                      |
+---------------------------------------+

To determine the columns contained in a given table, use SHOW COLUMNS or DESCRIBE:

mysql> DESCRIBE INFORMATION_SCHEMA.CHARACTER_SETS;
+----------------------+-------------+------+-----+---------+-------+
| Field                | Type        | Null | Key | Default | Extra |
+----------------------+-------------+------+-----+---------+-------+
| CHARACTER_SET_NAME   | varchar(64) | NO   |     |         |       |
| DEFAULT_COLLATE_NAME | varchar(64) | NO   |     |         |       |
| DESCRIPTION          | varchar(60) | NO   |     |         |       |
| MAXLEN               | bigint(3)   | NO   |     | 0       |       |
+----------------------+-------------+------+-----+---------+-------+

To display information from a table, use a SELECT statement. The general query to see all the columns in any given INFORMATION_SCHEMA table is as follows:

SELECT * FROM INFORMATION_SCHEMA.tbl_name;

Neither INFORMATION_SCHEMA nor any of its table or column names are case sensitive.

You can provide a WHERE clause to be specific about what you want to see. Here is a simple example that uses INFORMATION_SCHEMA to list information about the sampdb.member table:

mysql> SELECT * FROM INFORMATION_SCHEMA.TABLES
    -> WHERE TABLE_SCHEMA='sampdb' AND TABLE_NAME='member'\G
*************************** 1. row ***************************
   TABLE_CATALOG: NULL
    TABLE_SCHEMA: sampdb
      TABLE_NAME: member
      TABLE_TYPE: BASE TABLE
         ENGINE: MyISAM
        VERSION: 10
     ROW_FORMAT: Dynamic
     TABLE_ROWS: 102
 AVG_ROW_LENGTH: 137
    DATA_LENGTH: 14000
MAX_DATA_LENGTH: 4294967295
   INDEX_LENGTH: 2048
      DATA_FREE: 0
 AUTO_INCREMENT: 103
    CREATE_TIME: 2005-01-20 22:38:03
    UPDATE_TIME: 2005-01-20 22:38:23
     CHECK_TIME: NULL
TABLE_COLLATION: latin1_swedish_ci
       CHECKSUM: NULL
 CREATE_OPTIONS:
  TABLE_COMMENT:

The following list describes the tables in INFORMATION_SCHEMA.

Other INFORMATION_SCHEMA tables might be implemented over time. See the MySQL Reference Manual for the current list of tables.

Along with the implementation of INFORMATION_SCHEMA, several SHOW statements have been extended in MySQL 5.0 to allow a WHERE clause. Even though each SHOW statement still displays a fixed set of columns, WHERE provides more flexibility about specifying which rows to return. The WHERE clause should refer to the columns displayed by the SHOW statement. The WHERE capability has been added to the following SHOW statements:

SHOW DATABASES
SHOW TABLES
SHOW OPEN TABLES
SHOW COLUMNS
SHOW KEYS
SHOW VARIABLES
SHOW STATUS
SHOW TABLE STATUS
SHOW PROCEDURE STATUS
SHOW FUNCTION STATUS
SHOW CHARACTER SET
SHOW COLLATION

 

Performing Multiple-Table Retrievals with Joins

It does no good to put records in a database unless you retrieve them eventually and do something with them. That's the purpose of the SELECT statement: to help you get at your data. SELECT probably is used more often than any other in the SQL language, but it can also be the trickiest; the conditions you use for choosing rows can be arbitrarily complex and can involve comparisons between columns in many tables.

The basic syntax of the SELECT statement looks like this:

SELECT selection_list       # What columns to select
FROM table_list             # Which tables to select rows from
WHERE primary_constraint    # What conditions rows must satisfy
GROUP BY grouping_columns   # How to group results
ORDER BY sorting_columns    # How to sort results
HAVING secondary_constraint # Secondary conditions rows must satisfy
LIMIT count;                # Limiting row count on results

Everything in this syntax is optional except the word SELECT and the selection_list part that specifies what you want to produce as output. Some databases require the FROM clause as well. MySQL does not, which allows you to evaluate expressions without referring to any tables:

SELECT SQRT(POW(3,2)+POW(4,2));

In Chapter 1, we devoted quite a bit of attention to single-table SELECT statements, concentrating primarily on the output column list and the WHERE, GROUP BY, ORDER BY, HAVING, and LIMIT clauses. This section covers an aspect of SELECT that is often confusing: writing joins; that is, SELECT statements that retrieve records from multiple tables. We'll discuss the types of join MySQL supports, what they mean, and how to specify them. This should help you employ MySQL more effectively, because in many cases, the real problem of figuring out how to write a query is determining the proper way to join tables together.

One problem with using SELECT is that when you first encounter a new type of problem, it's not always easy to see how to write a SELECT query to solve it. However, after you figure it out, you can use that experience when you run across similar problems in the future. SELECT is probably the statement for which past experience plays the largest role in being able to use it effectively, simply because of the sheer variety of problems to which it applies.

As you gain experience, you'll be able to adapt joins more easily to new problems, and you'll find yourself thinking things like, "Oh, yes, that's one of those LEFT JOIN things," or, "Aha, that's a three-way join restricted by the common pairs of key columns." (I'm a little reluctant to point that out, actually. You may find it encouraging to hear that experience helps you. On the other hand, you may find it alarming to consider that you could wind up thinking in terms like that.)

Many of the examples that demonstrate how to use the forms of join operations that MySQL supports use the following two tables, t1 and t2:

Table t1:    Table t2:
+----+----+  +----+----+
| i1 | c1 |  | i2 | c2 |
+----+----+  +----+----+
| 1  | a  |  | 2  | c  |
| 2  | b  |  | 3  | b  |
| 3  | c  |  | 4  | a  |
+----+----+  +----+----+

The tables deliberately are chosen to be small so that the effect of each type of join can be seen readily.

Other types of multiple-table SELECT statement are subqueries (one SELECT nested within another) and UNION statements. These are covered later in "Performing Multiple-Table Retrievals with Subqueries" and "Performing Multiple-Table Retrievals with UNION."

A related multiple-table feature that MySQL supports is the capability of deleting or updating records in one table based on the contents of another. For example, you might want to remove records in one table that aren't matched by any record in another, or copy values from columns in one table to columns in another. "Multiple-Table Deletes and Updates" discusses these types of operations.

The Trivial Join

The simplest join is the trivial join, in which only one table is named. In this case, rows are selected from the single named table:

mysql> SELECT * FROM t1;
+----+----+
| i1 | c1 |
+----+----+
| 1  | a  |
| 2  | b  |
| 3  | c  |
+----+----+

Some people don't consider this form of SELECT as a join at all, and use the term only for SELECT statements that retrieve records from two or more tables. I suppose it's a matter of perspective.

The Cross Join

If a SELECT statement names multiple tables in the FROM clause with the names separated by commas, MySQL performs a cross join. For example, if you join t1 and t2 as follows, each row in t1 is combined with each row in t2:

mysql> SELECT * FROM t1, t2;
+----+----+----+----+
| i1 | c1 | i2 | c2 |
+----+----+----+----+
| 1  | a  | 2  | c  |
| 2  | b  | 2  | c  |
| 3  | c  | 2  | c  |
| 1  | a  | 3  | b  |
| 2  | b  | 3  | b  |
| 3  | c  | 3  | b  |
| 1  | a  | 4  | a  |
| 2  | b  | 4  | a  |
| 3  | c  | 4  | a  |
+----+----+----+----+

In this statement, SELECT * means "select every column from every table named in the FROM clause." You could also write this as SELECT t1.*, t2.*:

SELECT t1.*, t2.* FROM t1, t2;

If you don't want to select all columns or you want to display them in a different left-to-right order, just name each column that you want to see.

A cross join is so called because each row of each table is crossed with each row in every other table to produce all possible combinations. This is also known as the "cartesian product." Joining tables this way has the potential to produce a very large number of rows because the possible row count is the product of the number of rows in each table. A cross join between three tables that contain 100, 200, and 300 rows, respectively, could return 100 x 200 x 300 = 6 million rows. That's a lot of rows, even though the individual tables are small. In cases like this, normally a WHERE clause is useful for reducing the result set to a more manageable size.

If you add a WHERE clause causing tables to be matched on the values of certain columns, the join is called an "equi-join" because you're selecting only rows with equal values in the specified columns:

mysql> SELECT t1.*, t2.* FROM t1, t2 WHERE t1.i1 = t2.i2;
+----+----+----+----+
| i1 | c1 | i2 | c2 |
+----+----+----+----+
| 2  | b  | 2  | c  |
| 3  | c  | 3  | b  |
+----+----+----+----+

The INNER JOIN, CROSS JOIN, and JOIN join types are similar to the ',' (comma) join operator. For example, these statements all are equivalent:

SELECT t1.*, t2.* FROM t1, t2 WHERE t1.i1 = t2.i2;
SELECT t1.*, t2.* FROM t1 INNER JOIN t2 WHERE t1.i1 = t2.i2;
SELECT t1.*, t2.* FROM t1 CROSS JOIN t2 WHERE t1.i1 = t2.i2;
SELECT t1.*, t2.* FROM t1 JOIN t2 WHERE t1.i1 = t2.i2;

INNER JOIN, CROSS JOIN, and JOIN (but not the comma operator) allow alternative syntaxes for specifying how to match up table columns:

Left and Right Joins

An equi-join shows only rows where a match can be found in both tables. Left and right joins show matches, too, but also show rows in one table that have no match in the other table. Most of the examples in this section use LEFT JOIN, which identifies rows in the left table that are not matched by the right table. RIGHT JOIN is the same except that the roles of the tables are reversed.

A LEFT JOIN works like this: You specify the columns to be used for matching rows in the two tables. When a row from the left table matches a row from the right table, the contents of the rows are selected as an output row. When a row in the left table has no match, it is still selected for output, but joined with a "fake" row from the right table that contains NULL in all the columns.

In other words, a LEFT JOIN forces the result set to contain a row for every row in the left table, whether or not there is a match for it in the right table. The left-table rows with no match can be identified by the fact that all columns from the right table are NULL. These result rows tell you which rows are missing from the right table. That is an interesting and important property, because this kind of problem comes up in many different contexts. Which customers have not been assigned an account representative? For which inventory items have no sales been recorded? Or, closer to home with our sampdb database: Which students have not taken a particular exam? Which students have no records in the absence table (that is, which students have perfect attendance)?

Consider once again our two tables, t1 and t2:

Table t1:    Table t2:
+----+----+  +----+----+
| i1 | c1 |  | i2 | c2 |
+----+----+  +----+----+
| 1  | a  |  | 2  | c  |
| 2  | b  |  | 3  | b  |
| 3  | c  |  | 4  | a  |
+----+----+  +----+----+

If we use a equi-join to match these tables on t1.i1 and t2.i2, we'll get output only for the values 2 and 3, because those are the values that appear in both tables:

mysql> SELECT t1.*, t2.* FROM t1, t2 WHERE t1.i1 = t2.i2;
+----+----+----+----+
| i1 | c1 | i2 | c2 |
+----+----+----+----+
| 2  | b  | 2  | c  |
| 3  | c  | 3  | b  |
+----+----+----+----+

A left join produces output for every row in t1, whether or not t2 matches it. To write a left join, name the tables with LEFT JOIN in between rather than a comma, and specify the matching condition using an ON clause rather than a WHERE clause:

mysql> SELECT t1.*, t2.* FROM t1 LEFT JOIN t2 ON t1.i1 = t2.i2;
+----+----+------+------+
| i1 | c1 | i2   | c2   |
+----+----+------+------+
| 1  | a  | NULL | NULL |
| 2  | b  |    2 | c    |
| 3  | c  |    3 | b    |
+----+----+------+------+

Now there is an output row even for the t1.i1 value of 1, which has no match in t2. All the columns in this row that correspond to t2 columns have a value of NULL.

One thing to watch out for with LEFT JOIN is that if right-table columns are not defined as NOT NULL, you may get problematic rows in the result. For example, if the right table contains columns with NULL values, you won't be able to distinguish those NULL values from NULL values that identify unmatched rows.

As mentioned earlier, a RIGHT JOIN is like a LEFT JOIN with the roles of the tables reversed. These two statements are equivalent:

SELECT t1.*, t2.* FROM t1 LEFT JOIN t2 ON t1.i1 = t2.i2;
SELECT t1.*, t2.* FROM t2 RIGHT JOIN t1 ON t1.i1 = t2.i2;

The following discussion in phrased in terms of LEFT JOIN only, but you can adjust it for RIGHT JOIN by reversing table roles.

LEFT JOIN is especially useful when you want to find only those left table rows that are unmatched by the right table. Do this by adding a WHERE clause that selects only the rows that have NULL values in a right table column—in other words, the rows in one table that are missing from the other:

mysql> SELECT t1.*, t2.* FROM t1 LEFT JOIN t2 ON t1.i1 = t2.i2
    -> WHERE t2.i2 IS NULL;
+----+----+------+------+
| i1 | c1 | i2   | c2   |
+----+----+------+------+
| 1  | a  | NULL | NULL |
+----+----+------+------+

Normally, when you write a query like this, your real interest is in the unmatched values in the left table. The NULL columns from the right table are of no interest for display purposes, so you wouldn't bother naming them in the output column list:

mysql> SELECT t1.* FROM t1 LEFT JOIN t2 ON t1.i1 = t2.i2
    -> WHERE t2.i2 IS NULL;
+----+----+
| i1 | c1 |
+----+----+
| 1  | a  |
+----+----+

Like INNER JOIN, a LEFT JOIN can be written using an ON clause or a USING() clause to specify the matching conditions. As with INNER JOIN, ON can be used whether or not the joined columns from each table have the same name, but USING() requires that they have the same names.

LEFT JOIN has a few synonyms and variants. LEFT OUTER JOIN is a synonym for LEFT JOIN. MySQL also supports an ODBC-style notation for LEFT JOIN that uses curly braces (the OJ means "outer join"):

{ OJ tbl_name1 LEFT OUTER JOIN tbl_name2 ON join_expr }

NATURAL LEFT JOIN is similar to LEFT JOIN; it performs a LEFT JOIN, matching all columns that have the same name in the left and right tables. (Thus, no ON or USING clause is given.)

As already mentioned, LEFT JOIN is useful for answering "Which values are missing?" questions. Let's apply this principle to the tables in the sampdb database and consider a more complex example than those shown earlier using t1 and t2.

For the grade-keeping project, first mentioned in Chapter 1, we have a student table listing students, a grade_event table listing the grade events that have occurred, and a score table listing scores for each student for each grade event. However, if a student was ill on the day of some quiz or test, the score table wouldn't contain any score for the student for that event. A makeup quiz or test should be given in such cases, but how do we find these missing records?

The problem is to determine which students have no score for a given grade event, and to do this for each grade event. Another way to say this is that we want to find out which combinations of student and grade event are not present in the score table. This "which values are not present" wording is a tip-off that we want a LEFT JOIN. The join isn't as simple as in the previous examples, though: We aren't just looking for values that are not present in a single column, we're looking for a two-column combination. The combinations we want are all the student/event combinations. These are produced by crossing the student table with the grade_event table:

FROM student, grade_event

Then we take the result of that join and perform a LEFT JOIN with the score table to find the matches for student ID/event ID pairs:

FROM student, grade_event
   LEFT JOIN score ON student.student_id = score.student.id
           AND grade_event.event_id = score.event_id

Note that the ON clause allows the rows in the score table to be joined according to matches in different tables named earlier in the join. That's the key for solving this problem. The LEFT JOIN forces a row to be generated for each row produced by the cross join of the student and grade_event tables, even when there is no corresponding score table record. The result set rows for these missing score records can be identified by the fact that the columns from the score table will all be NULL. We can identify these records by adding a condition in the WHERE clause. Any column from the score table will do, but because we're looking for missing scores, it's probably conceptually clearest to test the score column:

WHERE score.score IS NULL

We can also sort the results using an ORDER BY clause. The two most logical orderings are by event per student and by student per event. I'll choose the first:

ORDER BY student.student_id, grade_event.event_id

Now all we need to do is name the columns we want to see in the output, and we're done. Here is the final statement:

SELECT
  student.name, student.student_id,
  grade_event.date, grade_event.event_id, grade_event.category
FROM
  student, grade_event
  LEFT JOIN score ON student.student_id = score.student_id
          AND grade_event.event_id = score.event_id
WHERE
  score.score IS NULL
ORDER BY
  student.student_id, grade_event.event_id;

Running the query produces these results:

+-----------+------------+------------+----------+----------+
| name      | student_id | date       | event_id | category |
+-----------+------------+------------+----------+----------+
| Megan     |          1 | 2004-09-16 |        4 | Q        |
| Joseph    |          2 | 2004-09-03 |        1 | Q        |
| Katie     |          4 | 2004-09-23 |        5 | Q        |
| Devri     |         13 | 2004-09-03 |        1 | Q        |
| Devri     |         13 | 2004-10-01 |        6 | T        |
| Will      |         17 | 2004-09-16 |        4 | Q        |
| Avery     |         20 | 2004-09-06 |        2 | Q        |
| Gregory   |         23 | 2004-10-01 |        6 | T        |
| Sarah     |         24 | 2004-09-23 |        5 | Q        |
| Carter    |         27 | 2004-09-16 |        4 | Q        |
| Carter    |         27 | 2004-09-23 |        5 | Q        |
| Gabrielle |         29 | 2004-09-16 |        4 | Q        |
| Grace     |         30 | 2004-09-23 |        5 | Q        |
+-----------+------------+------------+----------+----------+

Here's a subtle point. The output displays the student IDs and the event IDs. The student_id column appears in both the student and score tables, so at first you might think that the output column list could name either student.student_id or score.student_id. That's not the case, because the entire basis for being able to find the records we're interested in is that all the score table columns are returned by the LEFT JOIN as NULL. Selecting score.student_id would produce only a column of NULL values in the output. The same principle applies to deciding which event_id column to display. It appears in both the grade_event and score tables, but the query selects grade_event.event_id because the score.event_id values will always be NULL.

Performing Multiple-Table Retrievals with Subqueries

Subquery support is a capability that allows one SELECT statement to be written within parentheses and nested inside another. Here's an example that looks up the IDs for grade event records that correspond to tests ('T') and uses them to select scores for those tests:

SELECT * FROM score
WHERE event_id IN (SELECT event_id FROM grade_event WHERE category = 'T');

Before version 4.1, MySQL could not do subqueries, which was one of the knocks against it. The situation has changed and you can use subqueries freely now, although it is not unusual to see the claim "MySQL doesn't support subqueries." I guess that isn't surprising; some people still think MySQL doesn't support transactions, either.

Subqueries can return different amounts of information:

Subquery results can be tested in different ways:

A scalar subquery is the most restrictive because it produces only a single value. But as a consequence, scalar subqueries can be used in the widest variety of contexts. They are applicable essentially anywhere that you can use a scalar operand, such as a term of an expression, as a function argument, or in the output column list. Column, row, and table subqueries that return more information cannot be used in contexts that require a single value.

Subqueries can be correlated or uncorrelated. This is a function of whether a subquery refers to and is dependent on values in the outer query.

You can use subqueries with statements other than SELECT. However, for statements that modify tables (INSERT, REPLACE, DELETE, UPDATE) there is currently a restriction that the subquery cannot refer to the table being modified.

In some cases, subqueries can be rewritten as joins. You might find subquery rewriting techniques useful if you're writing queries that need to run on an older MySQL server, or if you want to see if the MySQL optimizer does a better job with a join than a subquery.

The following sections discuss the kinds of operations you can use to test subquery results, how to write correlated subqueries, and how to rewrite subqueries as joins.

Subqueries with Relative Comparison Operators

The =, <>, >, >=, <, and <= operators perform relative-value comparisons. When used with a scalar subquery, they find all rows in the outer query that stand in particular relationship to the value returned by the subquery. For example, to identify the scores for the quiz that took place on '2004-09-23', use a scalar subquery to determine the quiz event ID and then match score records against it in the outer SELECT:

SELECT * FROM score
WHERE event_id =
(SELECT event_id FROM grade_event
  WHERE date = '2004-09-23' AND category = 'Q');

With this form of statement, where the subquery is preceded by a value and a relative comparison operator, it is necessary that the subquery produce a single value. That is, it must be a scalar subquery; if it produces multiple values, the statement will fail. In some cases, it may be appropriate to satisfy the single-value requirement by limiting the subquery result with LIMIT 1.

Use of scalar subqueries with relative comparison operators is handy for solving problems where you'd be tempted to use an aggregate function in a WHERE clause. For example, to determine which of the presidents in the president table was born first, you might try this statement:

SELECT * FROM president WHERE birth = MIN(birth);

That doesn't work because you can't use aggregates in WHERE clauses. (The WHERE clause determines which records to select, but the value of MIN() isn't known until after the records have already been selected.) However, you can use a subquery to produce the minimum birth date like this:

SELECT * FROM president
WHERE birth = (SELECT MIN(birth) FROM president);

Other aggregate functions can be used to solve similar problems. The following statement uses a subquery to select the above-average scores from a given grade event:

SELECT * FROM score WHERE event_id = 5
AND score > (SELECT AVG(score) FROM score WHERE event_id = 5);

If a subquery returns a single row, you can use a row constructor to compare a set of values (that is, a tuple) to the subquery result. This statement returns records for presidents who were born in the same city and state as John Adams:

mysql> SELECT last_name, first_name, city, state FROM president
    -> WHERE (city, state) =
    -> (SELECT city, state FROM president
    -> WHERE last_name = 'Adams' AND first_name = 'John');
+-----------+-------------+-----------+-------+
| last_name | first_name  | city      | state |
+-----------+-------------+-----------+-------+
| Adams     | John        | Braintree | MA    |
| Adams     | John Quincy | Braintree | MA    |
+-----------+-------------+-----------+-------+

You can also use ROW(city,state) notation, which is equivalent to (city,state). Both act as row constructors that represent tuples.

IN and NOT IN Subqueries

The IN and NOT IN operators can be used when a subquery returns multiple rows to be evaluated in comparison to the outer query. They test whether a comparison value is present in a set of values. IN is true for rows in the outer query that match any row returned by the subquery. NOT IN is true for rows in the outer query that match no rows returned by the subquery. The following statements use IN and NOT IN to find those students who have absences listed in the absence table, and those who have perfect attendance (no absences):

mysql> SELECT * FROM student
    -> WHERE student_id IN (SELECT student_id FROM absence);
+-------+-----+------------+
| name  | sex | student_id |
+-------+-----+------------+
| Kyle  | M   |          3 |
| Abby  | F   |          5 |
| Peter | M   |         10 |
| Will  | M   |         17 |
| Avery | F   |         20 |
+-------+-----+------------+
mysql> SELECT * FROM student
    -> WHERE student_id NOT IN (SELECT student_id FROM absence);
+-----------+-----+------------+
| name      | sex | student_id |
+-----------+-----+------------+
| Megan     | F   |          1 |
| Joseph    | M   |          2 |
| Katie     | F   |          4 |
| Nathan    | M   |          6 |
| Liesl     | F   |          7 |
...

IN and NOT IN also work for subqueries that return multiple columns. In other words, you can use them with table subqueries. In this case, use a row constructor to specify the comparison values to test against each column:

mysql> SELECT last_name, first_name, city, state FROM president
    -> WHERE (city, state) IN
    -> (SELECT city, state FROM president
    -> WHERE last_name = 'Roosevelt');
+-----------+-------------+-----------+-------+
| last_name | first_name  | city      | state |
+-----------+-------------+-----------+-------+
| Roosevelt | Theodore    | New York  | NY    |
| Roosevelt | Franklin D. | Hyde Park | NY    |
+-----------+-------------+-----------+-------+

IN and NOT IN actually are synonyms for = ANY and <> ALL, which are covered in the next section.

ALL, ANY, and SOME Subqueries

The ALL and ANY operators are used in conjunction with a relative comparison operator to test the result of a column subquery. They test whether the comparison value stands in particular relationship to all or some of the values returned by the subquery. For example, <= ALL is true if the comparison value is less than or equal to every value that the subquery returns, whereas <= ANY is true if the comparison value is less than or equal to any value that the subquery returns. SOME is a synonym for ANY.

This statement determines which president was born first by selecting the record with a birth date less than or equal to all the birth dates in the president table (only the earliest date satisfies this condition):

mysql> SELECT last_name, first_name, birth FROM president
    -> WHERE birth <= ALL (SELECT birth FROM president);
+------------+------------+------------+
| last_name  | first_name | birth      |
+------------+------------+------------+
| Washington | George     | 1732-02-22 |
+------------+------------+------------+

On the other hand, the following statement returns all rows because every date is less than or equal to at least one other date (itself):

mysql> SELECT last_name, first_name, birth FROM president
    -> WHERE birth <= ANY (SELECT birth FROM president);
+------------+---------------+------------+
| last_name  | first_name    | birth      |
+------------+---------------+------------+
| Washington | George        | 1732-02-22 |
| Adams      | John          | 1735-10-30 |
| Jefferson  | Thomas        | 1743-04-13 |
| Madison    | James         | 1751-03-16 |
| Monroe     | James         | 1758-04-28 |
...

When ALL, ANY, or SOME are used with the = comparison operator, the subquery can be a table subquery. In this case, you test return rows using a row constructor to provide the comparison values.

mysql> SELECT last_name, first_name, city, state FROM president
    -> WHERE (city, state) = ANY
    -> (SELECT city, state FROM president
    -> WHERE last_name = 'Roosevelt');
+-----------+-------------+-----------+-------+
| last_name | first_name  | city      | state |
+-----------+-------------+-----------+-------+
| Roosevelt | Theodore    | New York  | NY    |
| Roosevelt | Franklin D. | Hyde Park | NY    |
+-----------+-------------+-----------+-------+

As mentioned in the previous section, IN and NOT IN are shorthand for = ANY and <> ALL. That is, IN means "equal to any of the values returned by the subquery" and NOT IN means "unequal to all values returned by the subquery."

EXISTS and NOT EXISTS Subqueries

The EXISTS and NOT EXISTS operators merely test whether a subquery returns any rows. If it does, EXISTS is true and NOT EXISTS is false. The following statements show some trivial examples of these subqueries. The first returns 0 if the absence table is empty, the second returns 1:

SELECT EXISTS (SELECT * FROM absence);
SELECT NOT EXISTS (SELECT * FROM absence);

EXISTS and NOT EXISTS actually are much more commonly used in correlated subqueries. The next section shows some examples.

With EXISTS and NOT EXISTS, the subquery uses * as the output column list. There's no need to name columns explicitly, because the subquery is assessed as true or false based on whether it returns any rows, not based on the particular values that the rows might contain. You can actually write pretty much anything for the subquery column selection list, but if you want to make it explicit that you're returning a true value when the subquery succeeds, you might write it with SELECT 1 rather than with SELECT *.

Correlated Subqueries

Subqueries can be uncorrelated or correlated:

Correlated subqueries commonly are used for EXISTS and NOT EXISTS subqueries, which are useful for finding records in one table that match or don't match records in another. Correlated subqueries work by passing values from the outer query to the subquery to see whether they match the conditions specified in the subquery. For this reason, it's necessary to qualify column names with table names if they are ambiguous (appear in more than one table).

The following EXISTS subquery identifies matches between the tables—that is, values that are present in both. The statement selects students who have at least one absence listed in the absence table:

SELECT student_id, name FROM student WHERE EXISTS
(SELECT * FROM absence WHERE absence.student_id = student.student_id);

NOT EXISTS identifies non-matches—values in one table that are not present in the other. This statement selects students who have no absences:

SELECT student_id, name FROM student WHERE NOT EXISTS
(SELECT * FROM absence WHERE absence.student_id = student.student_id);

Subqueries in the FROM Clause

Subqueries can be used in the FROM clause to generate values. In this case, the result of the subquery acts like a table. It can participate in joins, its values can be tested in the WHERE clause, and so forth. When using a subquery in a FROM clause, you must provide a table alias to give the subquery result a name.

mysql> SELECT * FROM (SELECT 1, 2, 3) AS t;
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+

Rewriting Subqueries as Joins

For versions of MySQL prior to 4.1, subqueries are not available. However, it's often possible to rephrase a query that uses a subquery in terms of a join. In fact, even for MySQL 4.1 or higher, it's not a bad idea to examine queries that you might be inclined to write in terms of subqueries. A join is sometimes more efficient than a subquery, so if a SELECT written as a subquery takes a long time to execute, try writing it as a join to see if it performs better. This section shows how to do that.

Rewriting Subqueries That Select Matching Values

Here's an example statement containing a subquery; it selects scores from the score table only for tests (that is, it ignores quiz scores):

SELECT * FROM score
WHERE event_id IN (SELECT event_id FROM grade_event WHERE category = 'T');

The same statement can be written without a subquery by converting it to a simple join:

SELECT score.* FROM score, grade_event
WHERE score.event_id = grade_event.event_id AND grade_event.category = 'T';

As another example, the following query selects scores for female students:

SELECT * from score
WHERE student_id IN (SELECT student_id FROM student WHERE sex = 'F');

This can be converted to a join as follows:

SELECT score.* FROM score, student
WHERE score.student_id = student.student_id AND student.sex = 'F';

There is a pattern here. The subquery statements follow this form:

SELECT * FROM table1
WHERE column1 IN (SELECT column2a FROM table2 WHERE column2b = value);

Such queries can be converted to a join using this form:

SELECT table1.* FROM table1, table2
WHERE table1.column1 = table2.column2a AND table2.column2b = value;

Note: In some cases, the subquery and the join might return different results. This occurs when table2 contains multiple instances of column2a. The subquery form produces only one instance of each column2a value, but the join would produce them all and its output would include duplicate rows. To suppress these duplicates, begin the join with SELECT DISTINCT rather than SELECT.

Rewriting Subqueries That Select Non-Matching (Missing) Values

Another common type of subquery statement searches for values in one table that are not present in another table. As we've seen before, the "which values are not present" type of problem is a clue that a LEFT JOIN may be helpful. Here's the statement with a subquery seen earlier that tests for students who are not listed in the absence table (it finds those students with perfect attendance):

SELECT * FROM student
WHERE student_id NOT IN (SELECT student_id FROM absence);

This query can be rewritten using a LEFT JOIN as follows:

SELECT student.*
FROM student LEFT JOIN absence ON student.student_id = absence.student_id
WHERE absence.student_id IS NULL;

In general terms, the subquery statement form is as follows:

SELECT * FROM table1
WHERE column1 NOT IN (SELECT column2 FROM table2);

A query having that form can be rewritten like this:

SELECT table1.*
FROM table1 LEFT JOIN table2 ON table1.column1 = table2.column2
WHERE table2.column2 IS NULL;

This assumes that table2.column2 is defined as NOT NULL.

The subquery has the advantage of being more intuitive than the LEFT JOIN. "Not in" is a concept that most people understand without difficulty, because it occurs outside the context of database programming. The same cannot be said for the concept of "left join," for which there is no such basis for natural understanding.

Performing Multiple-Table Retrievals with UNION

If you want to create a result set that combines the results from several queries, you can do so by using a UNION statement. For the examples in this section, assume that you have three tables, t1, t2, and t3 that look like this:

mysql> SELECT * FROM t1;
+------+-------+
| i    | c     |
+------+-------+
|    1 | red   |
|    2 | blue  |
|    3 | green |
+------+-------+
mysql> SELECT * FROM t2;
+------+------+
| i    | c    |
+------+------+
|  -1 | tan   |
|   1 | red   |
+------+------+
mysql> SELECT * FROM t3;
+------------+------+
| d          | i  |
+------------+------+
| 1904-01-01 | 100 |
| 2004-01-01 | 200 |
| 2004-01-01 | 200 |
+------------+------+

Tables t1 and t2 have integer and character columns, and t3 has date and integer columns. To write a UNION statement that combines multiple retrievals, just write several SELECT statements and put the keyword UNION between them. For example, to select the integer column from each table, do this:

mysql> SELECT i FROM t1 UNION SELECT i FROM t2 UNION SELECT i FROM t3;
+------+
| i    |
+------+
|    1 |
|    2 |
|    3 |
|   -1 |
|  100 |
|  200 |
+------+

UNION has the following properties:

You can "simulate" UNION by selecting rows from each table into a temporary table and then selecting the contents of that table. If you make the table a TEMPORARY table, it will be dropped automatically when your session with the server terminates. For quicker performance, use a MEMORY table:

CREATE TEMPORARY TABLE tmp ENGINE = MEMORY SELECT ... FROM t1 WHERE ... ;
INSERT INTO tmp SELECT ... FROM t2 WHERE ... ;
INSERT INTO tmp SELECT ... FROM t3 WHERE ... ;
...
SELECT * FROM tmp ORDER BY ... ;

If you want to run a UNION-type query on MyISAM tables that have the same structure, you can set up a MERGE table and query that. One reason this is useful is that it is simpler to write a query on a MERGE table than the corresponding UNION statement. A query on the MERGE table is similar to a UNION that selects corresponding columns from the individual tables that make up the MERGE table. That is, SELECT on a MERGE table is like UNION ALL (duplicates are not removed), and SELECT DISTINCT is like UNION or UNION DISTINCT (duplicates are removed).

Multiple-Table Deletes and Updates

Sometimes it's useful to delete records based on whether they match or don't match records in another table. Similarly, it's often useful to update records in one table using the contents of records in another table. This section describes how to perform multiple-table DELETE and UPDATE operations. These types of statements draw heavily on the concepts used for joins, so be sure you're familiar with the material discussed earlier in "Performing Multiple-Table Retrievals with Joins."

To perform a single-table DELETE or UPDATE, you refer only to the columns of one table and thus need not qualify the column names with the table name. For example, to delete all records in a table t that have id values greater than 100, you'd write a statement like this:

DELETE FROM t WHERE id > 100;

But what if you want to delete records based not on properties inherent in the records themselves, but rather on their relationship to records in another table? Suppose that you want to delete from t those records with id values that are found in another table t2?

To write a multiple-table DELETE, name all the tables in a FROM clause and specify the conditions used to match up records in the tables in the WHERE clause. The following statement deletes records from table t1 where there is a matching id value in table t2:

DELETE t1 FROM t1, t2 WHERE t1.id = t2.id;

Notice that the FROM clause names all the tables involved in the operation, just as when writing a join. In addition, if a column name appears in more than one of the tables, it becomes ambiguous and must be qualified with a table name. This too is similar to writing a join.

The syntax also allows for deleting records from multiple tables at once. To delete rows from both tables where there are matching id values, name them both after the DELETE keyword:

DELETE t1, t2 FROM t1, t2 WHERE t1.id = t2.id;

What if you want to delete non-matching records? A multiple-table DELETE can use any kind of join that you can write in a SELECT, so employ the same strategy that you'd use when writing a SELECT that identifies the non-matching records. That is, use a LEFT JOIN or RIGHT JOIN. For example, to identify records in t1 that have no match in t2, you'd write a SELECT like this:

SELECT t1.* FROM t1 LEFT JOIN t2 ON t1.id = t2.id WHERE t2.id IS NULL;

The analogous DELETE statement to find and remove those records from t1 uses a LEFT JOIN as well:

DELETE t1 FROM t1 LEFT JOIN t2 ON t1.id = t2.id WHERE t2.id IS NULL;

MySQL supports a second multiple-table DELETE syntax. With this syntax, use a FROM clause to indicate which tables records are to be deleted from and a USING clause to list the tables that determine which records to delete. The preceding multiple-table DELETE statements can be rewritten using this syntax as follows:

DELETE FROM t1 USING t1, t2 WHERE t1.id = t2.id;
DELETE FROM t1, t2 USING t1, t2 WHERE t1.id = t2.id;
DELETE FROM t1 USING t1 LEFT JOIN t2 ON t1.id = t2.id WHERE t2.id IS NULL;

The principles involved in writing multiple-table UPDATE statements are quite similar to those used for DELETE: Name all the tables that participate in the operation and qualify column references as necessary. Suppose that the quiz you gave on September 23, 2004 contained a question that everyone got wrong, and then you discover that the reason for this is that your answer key was incorrect. As a result, you must add a point to everyone's score. Without multiple-table UPDATE capability, you might accomplish this using two statements. First, look up the event ID corresponding to the quiz for the given date:

SELECT @id := event_id FROM grade_event
WHERE date = '2004-09-23' AND category = 'Q';

Then use the ID value to identify the relevant score records:

UPDATE score SET score = score + 1 WHERE event_id = @id;

With a multiple-table UPDATE, you can do the same thing with a single statement:

UPDATE score, grade_event SET score.score = score.score + 1
WHERE score.event_id = grade_event.event_id
AND grade_event.date = '2004-09-23' AND grade_event.category = 'Q';

You can not only identify records to update based on the contents of another table, you can copy column values from one table to another. The following statement copies t1.a to t2.a for records that have a matching id column value:

UPDATE t1, t2 SET t2.a = t1.a WHERE t2.id = t1.id;

If you want to perform multiple-table deletes or updates for InnoDB tables, you need not use the syntax just described. Instead set up a foreign key relationship between tables that includes an ON DELETE CASCADE or ON UPDATE CASCADE constraint. See "Foreign Keys and Referential Integrity" for details.

Performing Transactions

A transaction is a set of SQL statements that execute as a unit. Either all the statements execute successfully, or none of them have any effect. This is achieved through the use of commit and rollback capabilities. If all of the statements in the transaction succeed, you commit it to record their effect permanently in the database. If an error occurs during the transaction, you roll it back to cancel it. Any statements executed up to that point within the transaction are undone, leaving the database in the state it was in prior to the point at which the transaction began.

Commit and rollback provide the means for ensuring that halfway-done operations don't make their way into your database and leave it in a partially updated (inconsistent) state. The canonical example of this involves a financial transfer where money from one account is placed into another account. Suppose that Bill writes a check to Bob for $100.00 and Bob cashes the check. Bill's account should be decremented by $100.00 and Bob's account incremented by the same amount:

UPDATE account SET balance = balance - 100 WHERE name = 'Bill';
UPDATE account SET balance = balance + 100 WHERE name = 'Bob';

If a crash occurs between the two statements, the operation is incomplete. Depending on which statement executes first, Bill is $100 short without Bob having been credited, or Bob is given $100 without Bill having been debited. Neither outcome is correct. If transactional capabilities are not available to you, you have to figure out the state of ongoing operations at crash time by examining your logs manually in order to determine how to undo them or complete them. The rollback capabilities of transaction support allow you to handle this situation properly by undoing the effect of the statements that executed before the error occurred. (You may still have to determine which transactions weren't entered and re-issue them, but at least you don't have to worry about half-transactions making your database inconsistent.)

Another use for transactions is to make sure that the records involved in an operation are not modified by other clients while you're working with them. MySQL automatically performs locking for single SQL statements to keep clients from interfering with each other, but this is not always sufficient to guarantee that a database operation achieves its intended result, because some operations are performed over the course of several statements. In this case, different clients might interfere with each other. A transaction groups statements into a single execution unit to prevent concurrency problems that could otherwise occur in a multiple-client environment.

Transactional systems typically are characterized as providing ACID properties. ACID is an acronym for Atomic, Consistent, Isolated, and Durable, referring to four properties that transactions should have:

Transactional processing provides stronger guarantees about the outcome of database operations, but also requires more overhead in CPU cycles, memory, and disk space. MySQL offers some storage engines that are transaction-safe (such as InnoDB and BDB), and some that are not transaction-safe (such as MyISAM and MEMORY). Transactional properties are essential for some applications and not for others, and you can choose which ones make the most sense for your applications. Financial operations typically need transactions, and the guarantees of data integrity outweigh the cost of additional overhead. On the other hand, for an application that logs web page accesses to a database table, a loss of a few records if the server host crashes might be tolerable. In this case, you can use a non-transactional storage engine to avoid the overhead required for transactional processing.

Using Transactions to Ensure Safe Statement Execution

To use transactions, you must use a transactional storage engine. This means using either InnoDB or BDB tables. Engines such as MyISAM and MEMORY will not work. If you're not sure whether your MySQL server supports the InnoDB or BDB storage engines, see "Checking Which Storage Engines Are Available" earlier in the chapter.

By default, MySQL runs in autocommit mode, which means that changes made by individual statements are committed to the database immediately to make them permanent. In effect, each statement is its own transaction implicitly. To perform transactions explicitly, disable autocommit mode and then tell MySQL when to commit or roll back changes.

One way to perform a transaction is to issue a START TRANSACTION statement to suspend autocommit mode, execute the statements that make up the transaction, and end the transaction with a COMMIT statement to make the changes permanent. If an error occurs during the transaction, cancel it by issuing a ROLLBACK statement instead to undo the changes. START TRANSACTION suspends the current autocommit mode, so after the transaction has been committed or rolled back, the mode reverts to its state prior to the START TRANSACTION. (If autocommit was enabled beforehand, ending the transaction puts you back in autocommit mode. If it was disabled, ending the current transaction causes you to begin the next one.)

The following example illustrates this approach. First, create a table to use:

mysql> CREATE TABLE t (name CHAR(20), UNIQUE (name)) ENGINE = InnoDB;

The statement creates an InnoDB table, but you can use BDB if you like. Next, initiate a transaction with START TRANSACTION, add a couple of rows to the table, commit the transaction, and then see what the table looks like:

mysql> START TRANSACTION;
mysql> INSERT INTO t SET name = 'William';
mysql> INSERT INTO t SET name = 'Wallace';
mysql> COMMIT;
mysql> SELECT * FROM t;
+---------+
| name    |
+---------+
| Wallace |
| William |
+---------+

You can see that the rows have been recorded in the table. If you had started up a second instance of mysql and selected the contents of t after the inserts but before the commit, the rows would not show up. They would not become visible to the second mysql process until the COMMIT statement had been issued by the first one.

If an error occurs during a transaction, you can cancel it with ROLLBACK. Using the t table again, you can see this by issuing the following statements:

mysql> START TRANSACTION;
mysql> INSERT INTO t SET name = 'Gromit';
mysql> INSERT INTO t SET name = 'Wallace';
ERROR 1062 (23000): Duplicate entry 'Wallace' for key 1
mysql> ROLLBACK;
mysql> SELECT * FROM t;
+---------+
| name    |
+---------+
| Wallace |
| William |
+---------+

The second INSERT attempts to place a row into the table that duplicates an existing name value. The statement fails because name has a UNIQUE index. After issuing the ROLLBACK, the table has only the two rows that it contained prior to the failed transaction. In particular, the INSERT that was performed just prior to the point of the error has been undone and its effect is not recorded in the table.

Issuing a START TRANSACTION statement while a transaction is in process commits the current transaction implicitly before beginning a new one.

Note: For older versions of MySQL that do not recognize START TRANSACTION, use BEGIN instead.

Another way to perform transactions is to manipulate the autocommit mode directly using SET statements:

SET autocommit = 0;
SET autocommit = 1;

Setting the autocommit variable to zero disables autocommit mode. The effect of any statements that follow becomes part of the current transaction, which you end by issuing a COMMIT or ROLLBACK statement to commit or cancel it. With this method, autocommit mode remains off until you turn it back on, so ending one transaction also begins the next one. You can also commit a transaction by re-enabling autocommit mode.

To see how this approach works, begin with the same table as for the previous examples:

mysql> DROP TABLE t;
mysql> CREATE TABLE t (name CHAR(20), UNIQUE (name)) ENGINE = InnoDB;

Then disable autocommit mode, insert some records, and commit the transaction:

mysql> SET autocommit = 0;
mysql> INSERT INTO t SET name = 'William';
mysql> INSERT INTO t SET name = 'Wallace';
mysql> COMMIT;
mysql> SELECT * FROM t;
+---------+
| name    |
+---------+
| Wallace |
| William |
+---------+

At this point, the two records have been committed to the table, but autocommit mode remains disabled. If you issue further statements, they become part of a new transaction, which may be committed or rolled back independently of the first transaction. To verify that autocommit is still off and that ROLLBACK will cancel uncommitted statements, issue the following statements:

mysql> INSERT INTO t SET name = 'Gromit';
mysql> INSERT INTO t SET name = 'Wallace';
ERROR 1062 (23000): Duplicate entry 'Wallace' for key 1
mysql> ROLLBACK;
mysql> SELECT * FROM t;
+---------+
| name  |
+---------+
| Wallace |
| William |
+---------+

To re-enable autocommit mode, use this statement:

mysql> SET autocommit = 1;

As just described, a transaction ends when you issue a COMMIT or ROLLBACK statement, or when you re-enable autocommit while it is disabled. Transactions also end under the following circumstances:

If your client program automatically reconnects when the connection to the server is lost, the connection will be reset to its default state of having autocommit enabled.

Transactions are useful in all kinds of situations. Suppose that you're working with the score table that is part of the grade-keeping project and you discover that the grades for two students have gotten mixed up and need to be switched. The incorrectly entered grades are as follows:

mysql> SELECT * FROM score WHERE event_id = 5 AND student_id IN (8,9);
+------------+----------+-------+
| student_id | event_id | score |
+------------+----------+-------+
|          8 |        5 |    18 |
|          9 |        5 |    13 |
+------------+----------+-------+

To fix this, student 8 should be given a score of 13 and student 9 a score of 18. That can be done easily with two statements:

UPDATE score SET score = 13 WHERE event_id = 5 AND student_id = 8;
UPDATE score SET score = 18 WHERE event_id = 5 AND student_id = 9;

However, it's necessary to ensure that both statements succeed as a unit. This is a problem to which transactional methods may be applied. To use START TRANSACTION, do this:

mysql> START TRANSACTION;
mysql> UPDATE score SET score = 13 WHERE event_id = 5 AND student_id = 8;
mysql> UPDATE score SET score = 18 WHERE event_id = 5 AND student_id = 9;
mysql> COMMIT;

To accomplish the same thing by manipulating the autocommit mode explicitly instead, do this:

mysql> SET autocommit = 0;
mysql> UPDATE score SET score = 13 WHERE event_id = 5 AND student_id = 8;
mysql> UPDATE score SET score = 18 WHERE event_id = 5 AND student_id = 9;
mysql> COMMIT;
mysql> SET autocommit = 1;

Either way, the result is that the scores are swapped properly:

mysql> SELECT * FROM score WHERE event_id = 5 AND student_id IN (8,9);
+------------+----------+-------+
| student_id | event_id | score |
+------------+----------+-------+
|          8 |        5 |    13 |
|          9 |        5 |    18 |
+------------+----------+-------+

Using Transaction Savepoints

As of MySQL 4.1.1, it is possible to perform a partial rollback of a transaction. To do this, issue a SAVEPOINT statement to set a marker in the transaction. To roll back to just that point in the transaction later, use a ROLLBACK statement that names the savepoint. The following statements illustrate how this works:

mysql> CREATE TABLE t (i INT) ENGINE = INNODB;
mysql> START TRANSACTION;
mysql> INSERT INTO t VALUES(1);
mysql> SAVEPOINT my_savepoint;
mysql> INSERT INTO t VALUES(2);
mysql> ROLLBACK TO SAVEPOINT my_savepoint;
mysql> INSERT INTO t VALUES(3);
mysql> COMMIT;
mysql> SELECT * FROM t;
+------+
| i    |
+------+
|  1   |
|  3   |
+------+

After executing these statements, the first and third records have been inserted, but the second one has been canceled by the partial rollback to the my_savepoint savepoint.

Transaction Isolation

Because MySQL is a multiple-user database system, different clients can attempt to use any given table at the same time. Storage engines such as MyISAM use table locking to keep clients from modifying a table at the same time, but this does not provide good concurrency performance when there are many updates. The InnoDB storage takes a different approach. It uses row-level locking for finer-grained control over table access by clients. One client can modify a row at the same time that another client reads or modifies a different row in the same table. If both clients want to modify a row at the same time, whichever of them acquires a lock on the row gets to modify it first. This provides better concurrency than table locking. However, there is the question about whether one client's transaction should be able to see the changes made by another client's transaction.

InnoDB implements transaction isolation levels to give clients control over what kind of changes made by other transactions they want to see. Different isolation levels allow or prevent the various problems that can occur when different transactions run simultaneously:

To deal with these problems, InnoDB provides four transaction isolation levels. These levels determine which modifications made by one transaction can be seen by other transactions that execute at the same time:

Table 2.3 shows for each isolation level whether they allow dirty reads, nonrepeatable reads, or phantom rows. The table is InnoDB-specific in that REPEATABLE READ does not allow phantom rows to occur. Some database systems do allow phantoms at the REPEATABLE READ isolation level.

Table 2.3 Problems Allowed by Isolation Levels

Isolation Level

Dirty Reads

Nonrepeatable Reads

Phantom Rows

READ UNCOMMITTED

Yes

Yes

Yes

READ COMMITTED

No

Yes

Yes

REPEATABLE READ

No

No

No

SERIALIZABLE

No

No

No


The default InnoDB isolation level is REPEATABLE READ. This can be changed at server startup with the --transaction-isolation option, or at runtime with the SET TRANSACTION statement. The statement has three forms:

SET GLOBAL TRANSACTION ISOLATION LEVEL level
SET SESSION TRANSACTION ISOLATION LEVEL level
SET TRANSACTION ISOLATION LEVEL level

A client that has the SUPER privilege can use SET TRANSACTION to change the global isolation level, which then applies to any clients that connect thereafter. In addition, any client can change its own transaction isolation level, either for all subsequent transactions within its session with the server or for just its next transaction. No special privileges are required for the client-specific levels.

Non-Transactional Approaches to Transactional Problems

In a non-transactional environment, some transactional issues can be dealt with and some cannot. The following discussion covers what can and cannot be achieved without using transactions. You can use this information to determine whether an application can employ the techniques here and avoid the overhead of transaction-safe tables.

First, let's consider how concurrency problems can occur when multiple clients attempt to make changes to a database using operations that each require several statements. Suppose that you're in the garment sales business and your cash register software automatically updates your inventory levels whenever one of your salesmen processes a sale. The sequence of events shown here outlines the operations that take place when multiple sales occur. For the example, assume that the initial shirt inventory level is 47.

  1. Salesman A sells three shirts and registers the sale. The register software begins to update the database by selecting the current shirt count (47):

  2. SELECT quantity FROM inventory WHERE item = 'shirt';
  3. In the meantime, Salesman B has sold two shirts and registered the sale. The software at the second register also begins to update the database:

  4. SELECT quantity FROM inventory WHERE item = 'shirt';
  5. The first register computes the new inventory level to be 47–3 = 44 and updates the shirt count accordingly:

  6. UPDATE inventory SET quantity = 44 WHERE item = 'shirt';
  7. The second register computes the new inventory level to be 47–2 = 45 and updates the count:

  8. UPDATE inventory SET quantity = 45 WHERE item = 'shirt';

At the end of this sequence of events, you've sold five shirts. That's good. However, the inventory level says 45. That's bad, because it should be 42. The problem is that if you look up the inventory level in one statement and update the value in another statement, you have a multiple-statement operation. The action taken in the second statement is dependent on the value retrieved in the first. If separate multiple-statement operations occur during overlapping time frames, the statements from each operation intertwine and interfere with each other. To solve this problem, it's necessary that the statements for a given operation execute without interference from other operations. A transactional system ensures this by executing each salesman's statements as a unit and isolating them from each other. As a result, Salesman B's statements won't execute until those for Salesman A have completed.

To deal with the concurrency issues inherent in the situation just described, you can take a couple of approaches:

The non-transactional approaches just described can be applied successfully to many types of problems, but they have certain limitations:

If any of these issues are significant for your applications, you should use transaction-safe tables instead, because transactional capabilities help you deal with each issue. A transaction handler executes a set of statements as a unit and manages concurrency issues by preventing clients from getting in the way of each other. It also allows rollback in the case of failure to keep half-executed operations from damaging your database, and it determines which locks are necessary and acquires them automatically.

Foreign Keys and Referential Integrity

A foreign key relationship allows you to declare that an index in one table is related to an index in another. It also allows you to place constraints on what may be done to the tables in the relationship. The database enforces the rules of this relationship to maintain referential integrity. For example, the score table in the sampdb sample database contains a student_id column, which we use to relate score records to students in the student table. When we created these tables in Chapter 1, we set up some explicit relationships between them. One of these was that we declared score.student_id to be a foreign key for the student.student_id column. That prevents a record from being entered into the score table unless its student_id value exists in the student table. In other words, the foreign key prevents entry of scores for non-existent students.

Foreign keys are not useful just for record entry, but for deletes and updates as well. For example, we could set up a constraint such that if a student is deleted from the student table, all corresponding records for the student in the score table are deleted automatically as well. This is called "cascaded delete," because the effect of the delete cascades from one table to another. Cascaded update is possible as well. For example, with cascaded update, changing a student's student_id value in the student table also changes the value in the student's corresponding score table records.

Foreign keys help maintain the consistency of your data, and they provide a certain measure of convenience. Without foreign keys, you are responsible for keeping track of inter-table dependencies and maintaining their consistency from within your applications. In some cases, doing this might not be much more work than issuing a few extra DELETE statements to make sure that when you delete a record from one table, you also delete the corresponding records in any related tables. But it is extra work, and if the database engine will perform consistency checks for you, why not let it? Automatic checking capability becomes especially useful if your tables have particularly complex relationships. You likely will not want to be responsible for implementing these dependencies in your applications.

In MySQL, the InnoDB storage engine provides foreign key support. This section describes how to set up InnoDB tables to define foreign keys, and how foreign keys affect the way you use tables. First, it's necessary to define some terms:

InnoDB enforces these rules to guarantee that the foreign key relationship stays intact with no mismatches. This is called "referential integrity."

The syntax for defining a foreign key in a child table is as follows, with optional parts shown in square brackets:

[CONSTRAINT constraint_name]
FOREIGN KEY [index_name] (index_columns)
 REFERENCES tbl_name (index_columns)
 [ON DELETE action]
 [ON UPDATE action]
 [MATCH FULL | MATCH PARTIAL | MATCH SIMPLE]

Note that although all parts of this syntax are parsed, InnoDB does not implement the semantics for all the clauses: The MATCH clause is not supported and is ignored if you specify it. Also, some action values are recognized but have no effect1. InnoDB pays attention to the following parts of the definition:

To set up a foreign key relationship, follow these guidelines:

In Chapter 1, we created tables for the grade-keeping project that have simple foreign key relationships. Now let's work through an example that is more complex. Begin by creating tables named parent and child, such that the child table contains a foreign key that references the par_id column in the parent table:

CREATE TABLE parent
(
  par_id   INT NOT NULL,
  PRIMARY KEY (par_id)
) ENGINE = INNODB;

CREATE TABLE child
(
  par_id   INT NOT NULL,
  child_id  INT NOT NULL,
  PRIMARY KEY (par_id, child_id),
  FOREIGN KEY (par_id) REFERENCES parent (par_id)
    ON DELETE CASCADE
    ON UPDATE CASCADE
) ENGINE = INNODB;

The foreign key in this case uses ON DELETE CASCADE to specify that when a record is deleted from the parent table, MySQL also should remove child records with a matching par_id value automatically. ON UPDATE CASCADE indicates that if a parent record par_id value is changed, MySQL also should change any matching par_id values in the child table to the new value.

Now insert a few records into the parent table, and then add some records to the child table that have related key values:

mysql> INSERT INTO parent (par_id) VALUES(1),(2),(3);
mysql> INSERT INTO child (par_id,child_id) VALUES(1,1),(1,2);
mysql> INSERT INTO child (par_id,child_id) VALUES(2,1),(2,2),(2,3);
mysql> INSERT INTO child (par_id,child_id) VALUES(3,1);

These statements result in the following table contents, where each par_id value in the child table matches a par_id value in the parent table:

mysql> SELECT * FROM parent;
+--------+
| par_id |
+--------+
|      1 |
|      2 |
|      3 |
+--------+
mysql> SELECT * FROM child;
+--------+----------+
| par_id | child_id |
+--------+----------+
|      1 |        1 |
|      1 |        2 |
|      2 |        1 |
|      2 |        2 |
|      2 |        3 |
|      3 |        1 |
+--------+----------+

To verify that InnoDB enforces the key relationship for insertion, try adding a record to the child table that has a par_id value not found in the parent table:

mysql> INSERT INTO child (par_id,child_id) VALUES(4,1);
ERROR 1216 (23000): Cannot add or update a child row:
a foreign key constraint fails

To test cascaded delete, see what happens when you delete a parent record:

mysql> DELETE FROM parent WHERE par_id = 1;

MySQL deletes the record from the parent table:

mysql> SELECT * FROM parent;
+--------+
| par_id |
+--------+
|      2 |
|      3 |
+--------+

In addition, it cascades the effect of the DELETE statement to the child table:

mysql> SELECT * FROM child;
+--------+----------+
| par_id | child_id |
+--------+----------+
|      2 |        1 |
|      2 |        2 |
|      2 |        3 |
|      3 |        1 |
+--------+----------+

To test cascaded update, see what happens when you update a parent record:

mysql> UPDATE parent SET par_id = 100 WHERE par_id =2;
mysql> SELECT * FROM parent;
+--------+
| par_id |
+--------+
|      3 |
|    100 |
+--------+
mysql> SELECT * FROM child;
+--------+----------+
| par_id | child_id |
+--------+----------+
|      3 |        1 |
|    100 |        1 |
|    100 |        2 |
|    100 |        3 |
+--------+----------+

The preceding example shows how to arrange for deletes or updates of a parent record to cause cascaded deletes or updates of any corresponding child records. The ON DELETE and ON UPDATE clauses allow for other actions. For example, one possibility is to let the child records remain in the table but have their foreign key columns set to NULL. To do this, it's necessary to make several changes to the definition of the child table:

To see the effect of these changes, re-create the parent table using the original definition and load the same initial records into it. Then create the child table using the new definition shown here:

CREATE TABLE child
(
  par_id   INT NULL,
  child_id  INT NOT NULL,
  UNIQUE (par_id, child_id),
  FOREIGN KEY (par_id) REFERENCES parent (par_id)
    ON DELETE SET NULL
    ON UPDATE SET NULL
) ENGINE = INNODB;

With respect to inserting new records, the child table behaves the same. That is, it allows insertion of records with par_id values found in the parent table, but disallows entry of values that aren't listed there2:

mysql> INSERT INTO child (par_id,child_id) VALUES(1,1),(1,2);
mysql> INSERT INTO child (par_id,child_id) VALUES(2,1),(2,2),(2,3);
mysql> INSERT INTO child (par_id,child_id) VALUES(3,1);
mysql> INSERT INTO child (par_id,child_id) VALUES(4,1);
ERROR 1216 (23000): Cannot add or update a child row:
a foreign key constraint fails

A difference in behavior occurs when you delete a parent record. Try removing a parent record and then check the contents of the child table to see what happens:

mysql> DELETE FROM parent WHERE par_id = 1;
mysql> SELECT * FROM child;
+--------+----------+
| par_id | child_id |
+--------+----------+
|   NULL |        1 |
|   NULL |        2 |
|      2 |        1 |
|      2 |        2 |
|      2 |        3 |
|      3 |        1 |
+--------+----------+

In this case, the child records that had 1 in the par_id column are not deleted. Instead, the par_id column is set to NULL, as specified by the ON DELETE SET NULL constraint.

Updating a parent record has a similar effect:

mysql> UPDATE parent SET par_id = 100 WHERE par_id = 2;
mysql> SELECT * FROM child;
+--------+----------+
| par_id | child_id |
+--------+----------+
|   NULL |        1 |
|   NULL |        1 |
|   NULL |        2 |
|   NULL |        2 |
|   NULL |        3 |
|      3 |        1 |
+--------+----------+

To see what foreign key relationships an InnoDB table has, use the SHOW CREATE TABLE or SHOW TABLE STATUS statement.

If an error occurs when you attempt to create a table that has a foreign key, use the SHOW INNODB STATUS statement to get the full error message.

Living Without Foreign Keys

If your MySQL server doesn't have InnoDB support, or you are using another storage engine because you need features that InnoDB does not support (such as FULLTEXT indexes or spatial data types), you cannot take advantage of foreign keys. What should you do to maintain the integrity of relationships between your tables?

The constraints that foreign keys enforce often are not difficult to implement through application logic. Sometimes, it's simply a matter of how you approach the data entry process. Consider the student and score tables from the grade-keeping project. These are related by a foreign key relationship through the student_id values in each table. Suppose that we had created these as MyISAM tables rather than as InnoDB tables. MyISAM does not support foreign keys, so in this case the relationship between the tables would be implicit rather than explicit. When you administer a test or quiz and have a new set of scores to add to the database, you'd have to make sure that you don't add score records with student_id values that are not listed in the student table.

In some respects, this is simply a matter of taking the proper approach to data entry. To avoid inserting scores for non-existent students, the way you'd enter a set of scores probably would be to use an application that begins with a list of students from the student table. For each one, it would take the score and use the student's ID number to generate a new score table record. With this procedure, you would never enter a record for a student that doesn't exist. Nevertheless, it would still be possible to enter a bad record, for example, if you issued an INSERT statement manually. (With InnoDB tables and foreign keys, no such possibility exists.)

What about the case where you delete a student record? Suppose that you want to delete student number 13. This also implies you want to delete any score records for that student. With a foreign key relationship in place that specifies cascading delete, you'd simply delete the student table record with the following statement and let MySQL take care of removing the corresponding score table records automatically:

DELETE FROM student WHERE student_id = 13;

Without foreign key support, you must explicitly delete records for all relevant tables to achieve the same effect as cascading on DELETE:

DELETE FROM student WHERE student_id = 13;
DELETE FROM score WHERE student_id = 13;

Another way to do this is to use a multiple-table delete that achieves the same effect as a cascaded delete with a single query. But watch out for a subtle trap. The following statement appears to do the trick, but it's actually not quite correct:

DELETE student, score FROM student, score
WHERE student.student_id = 13 AND student.student_id = score.student_id;

The problem with this statement is that it will fail in the case where the student doesn't have any scores. The WHERE clause will find no matches and thus will not delete anything from the student table. In this case, a LEFT JOIN is more appropriate, because it will identify which student table record to delete even in the absence of any matching score table records:

DELETE student, score FROM student LEFT JOIN score USING (student_id)
WHERE student.student_id = 13;

Using FULLTEXT Searches

MySQL includes the capability for performing full text searches. The full-text search engine allows you to look for words or phrases without using pattern-matching operations. There are three kinds of full-text search:

Full-text search capability is enabled for a given table by creating a special kind of index and has the following characteristics:

The following examples show how to use full-text searching by creating FULLTEXT indexes and then performing queries on them using the MATCH operator. A script to create the table and some sample data to load into it are available in the fulltext directory of the sampdb distribution.

A FULLTEXT index is created much the same way as other indexes. That is, you can define it with CREATE TABLE when creating the table initially, or add it afterward with ALTER TABLE or CREATE INDEX. Because FULLTEXT indexes require you to use MyISAM tables, you can take advantage of one of the properties of the MyISAM storage engine if you're creating a new table to use for FULLTEXT searches: Table loading proceeds more quickly if you populate the table first and then add the indexes afterward, rather than loading data into an already indexed table. Suppose that you have a data file named apothegm.txt containing famous sayings and the people to whom they're attributed:

Aeschylus             Time as he grows old teaches many lessons
Alexander Graham Bell Mr. Watson, come here. I want you!
Benjamin Franklin     It is hard for an empty bag to stand upright
Benjamin Franklin     Little strokes fell great oaks
Benjamin Franklin     Remember that time is money
Miguel de Cervantes   Bell, book, and candle
Proverbs 15:1         A soft answer turneth away wrath
Theodore Roosevelt    Speak softly and carry a big stick
William Shakespeare   But, soft! what light through yonder window breaks?
Robert Burton         I light my candle from their torches.

If you want to search by phrase and attribution separately or together, you need to index each column separately, and also create an index that includes both columns. You can create, populate, and index a table named apothegm as follows:

CREATE TABLE apothegm (attribution VARCHAR(40), phrase TEXT) ENGINE = MyISAM;
LOAD DATA LOCAL INFILE 'apothegm.txt' INTO TABLE apothegm;
ALTER TABLE apothegm
  ADD FULLTEXT (phrase),
  ADD FULLTEXT (attribution),
  ADD FULLTEXT (phrase, attribution);

Natural Language FULLTEXT Searches

After setting up the table, perform natural language full-text searches on it using MATCH to name the column or columns to search and AGAINST() to specify the search string. For example:

mysql> SELECT * FROM apothegm WHERE MATCH(attribution) AGAINST('roosevelt');
+--------------------+------------------------------------+
| attribution        | phrase                             |
+--------------------+------------------------------------+
| Theodore Roosevelt | Speak softly and carry a big stick |
+--------------------+------------------------------------+
mysql> SELECT * FROM apothegm WHERE MATCH(phrase) AGAINST('time');
+-------------------+-------------------------------------------+
| attribution       | phrase                                    |
+-------------------+-------------------------------------------+
| Benjamin Franklin | Remember that time is money               |
| Aeschylus         | Time as he grows old teaches many lessons |
+-------------------+-------------------------------------------+
mysql> SELECT * FROM apothegm WHERE MATCH(attribution, phrase)
    -> AGAINST('bell');
+-----------------------+------------------------------------+
| attribution           | phrase                             |
+-----------------------+------------------------------------+
| Alexander Graham Bell | Mr. Watson, come here. I want you! |
| Miguel de Cervantes   | Bell, book, and candle             |
+-----------------------+------------------------------------+

In the last example, note how the query finds records that contain the search word in different columns, which demonstrates the FULLTEXT capability of searching multiple columns at once. Also note that the order of the columns as named in the query is attribution, phrase. That differs from the order in which they were named when the index was created (phrase, attribution), which illustrates that order does not matter. What matters is that there must be some FULLTEXT index that consists of exactly the columns named.

If you just want to see how many records a search matches, use COUNT(*):

mysql> SELECT COUNT(*) FROM apothegm WHERE MATCH(phrase) AGAINST('time');
+----------+
| COUNT(*) |
+----------+
|        2 |
+----------+

Output rows for natural language FULLTEXT searches are ordered by decreasing relevance when you use a MATCH expression in the WHERE clause. Relevance values are non-negative floating point values, with zero indicating "no relevance." To see these values, use a MATCH expression in the output column list:

mysql> SELECT phrase, MATCH(phrase) AGAINST('time') AS relevance
    -> FROM apothegm;
+-----------------------------------------------------+-----------------+
| phrase                                              | relevance       |
+-----------------------------------------------------+-----------------+
| Time as he grows old teaches many lessons           | 1.3253291845322 |
| Mr. Watson, come here. I want you!                  |               0 |
| It is hard for an empty bag to stand upright        |               0 |
| Little strokes fell great oaks                      |               0 |
| Remember that time is money                         | 1.3400621414185 |
| Bell, book, and candle                              |               0 |
| A soft answer turneth away wrath                    |               0 |
| Speak softly and carry a big stick                  |               0 |
| But, soft! what light through yonder window breaks? |               0 |
| I light my candle from their torches.               |               0 |
+-----------------------------------------------------+-----------------+

A natural language search finds records that contain any of the search words, so a query such as the following returns records with either "hard" or "soft":

mysql> SELECT * FROM apothegm WHERE MATCH(phrase)
    -> AGAINST('hard soft');
+---------------------+-----------------------------------------------------+
| attribution         | phrase                                              |
+---------------------+-----------------------------------------------------+
| Benjamin Franklin   | It is hard for an empty bag to stand upright        |
| Proverbs 15:1       | A soft answer turneth away wrath                    |
| William Shakespeare | But, soft! what light through yonder window breaks? |
+---------------------+-----------------------------------------------------+

Boolean Mode FULLTEXT Searches

Greater control over multiple-word matching can be obtained by using boolean mode FULLTEXT searches. This type of search is performed by adding IN BOOLEAN MODE after the search string in the AGAINST() function. Boolean searches have the following characteristics:

Query Expansion FULLTEXT Searches

A full-text search with query expansion performs a two-phase search. The initial search is like a regular natural language search. Then the most highly relevant records from this search are used for the second phase. The words in these records are used along with the original search terms to perform a second search. Because the set of search terms is larger, the result generally includes records that are not found in the first phase but are related to them.

To perform this kind of search, add WITH QUERY EXPANSION following the search terms. The following example provides an illustration. The first query shows a natural language search. The second query shows a query expansion search. Its result includes an extra record that contains none of the original search terms. This record is found because it contains the word "candle" that is present in one of the records found by the natural language search.

mysql> SELECT * FROM apothegm
    -> WHERE MATCH(attribution, phrase)
    -> AGAINST('bell book');
+-----------------------+------------------------------------+
| attribution           | phrase                             |
+-----------------------+------------------------------------+
| Miguel de Cervantes   | Bell, book, and candle             |
| Alexander Graham Bell | Mr. Watson, come here. I want you! |
+-----------------------+------------------------------------+
mysql> SELECT * FROM apothegm
    -> WHERE MATCH(attribution, phrase)
    -> AGAINST('bell book' WITH QUERY EXPANSION);
+-----------------------+---------------------------------------+
| attribution           | phrase                                |
+-----------------------+---------------------------------------+
| Miguel de Cervantes   | Bell, book, and candle                |
| Alexander Graham Bell | Mr. Watson, come here. I want you!    |
| Robert Burton         | I light my candle from their torches. |
+-----------------------+---------------------------------------+

Configuring the FULLTEXT Search Engine

Several full-text parameters are configurable and can be modified by setting system variables. The parameters that determine the shortest and longest words to index in FULLTEXT indexes are ft_min_word_len and ft_max_word_len. Words with lengths outside the range defined by these two variables are ignored when FULLTEXT indexes are built. The default minimum value is 4. The default maximum value depends on your server version. Currently in MySQL 4.1, the maximum is 84.

Suppose that you want to change the minimum word length from 4 to 3. Do so like this:

  1. Start the server with the ft_min_word_len variable set to 3. To ensure that this happens whenever the server starts, it's best to place the setting in an option file such as /etc/my.cnf:

  2. [mysqld]
    set-variable = ft_min_word_len=3
  3. For any existing tables that already have FULLTEXT indexes, you must rebuild those indexes. You can drop and add the indexes, but it's easier and sufficient to perform a quick repair operation:

  4. REPAIR TABLE tbl_name QUICK;
  5. Any new FULLTEXT indexes that you create after changing the parameter will use the new value automatically.

For more information on setting system variables, see Appendix D. For details on using option files, see Appendix F.

Note: If you use myisamchk to rebuild indexes for a table that contains any FULLTEXT indexes, see the FULLTEXT-related notes in the myisamchk description in Appendix F.

New Features in MySQL 5.0

This section discusses some of the important new features that are being developed in MySQL 5.0:

These are all recent additions, so this section is relatively brief to provide just an overview. Because the capabilities described here are new, you may encounter limitations. However, development is ongoing, so as these features mature and restrictions are eased, their usefulness will increase.

Using Views

A view is a virtual table. That is, it acts like a table but actually contains no data. Instead, it is defined in terms of tables (or other views) and provides alternative ways to look at table data. Often this can simplify applications. Views were introduced in MySQL 5.0.1.

A simple view can be nothing more than a way to select a subset of a table's columns. Suppose that you often want to select only the last_name, first_name, city, and state columns from the president table, but you don't want to write out all the columns like this:

SELECT last_name, first_name, city, state FROM president;

Nor do you want to use SELECT *. That's easier to write, but * retrieves columns that you don't want. The solution is to define a view that retrieves only the desired columns:

CREATE VIEW vpres AS
SELECT last_name, first_name, city, state FROM president;

Now the view acts as a "window" into just those columns that you want to see. This means that you can use SELECT * with the view and get back only the columns named in the view definition:

mysql> SELECT * FROM vpres;
+------------+---------------+---------------------+-------+
| last_name  | first_name    | city                | state |
+------------+---------------+---------------------+-------+
| Washington | George        | Wakefield           | VA  |
| Adams      | John          | Braintree           | MA  |
| Jefferson  | Thomas        | Albemarle County    | VA  |
| Madison    | James         | Port Conway         | VA  |
| Monroe     | James         | Westmoreland County | VA  |
...

If you include a WHERE clause, MySQL adds it to the view definition when executing the statement to further restrict the result:

mysql> SELECT * FROM vpres WHERE last_name = 'Adams';
+-----------+-------------+-----------+-------+
| last_name | first_name  | city      | state |
+-----------+-------------+-----------+-------+
| Adams     | John        | Braintree | MA    |
| Adams     | John Quincy | Braintree | MA    |
+-----------+-------------+-----------+-------+

The same is true if you add ORDER BY, LIMIT, and so forth.

When you use a view, you can refer only to those columns named in the view definition. That is, you cannot refer to a column that is not part of the view:

mysql> SELECT * FROM vpres WHERE suffix <> '';
ERROR 1054 (42S22): Unknown column 'suffix' in 'where clause'

The column names for a view by default are those named in the output column list of its SELECT statement. To provide column names explicitly, add a list of names in parentheses following the view name in the view definition:

mysql> CREATE VIEW vpres2 (ln, fn) AS
    -> SELECT last_name, first_name FROM president;

Now when you refer to the view, you must use the given column names rather than the names in the SELECT:

mysql> SELECT last_name, first_name FROM vpres2;
ERROR 1054 (42S22) at line 1: Unknown column 'last_name' in 'field list'
mysql> SELECT ln, fn FROM vpres2;
+------------+---------------+
| ln         | fn            |
+------------+---------------+
| Washington | George        |
| Adams      | John          |
| Jefferson  | Thomas        |
| Madison    | James         |
| Monroe     | James         |
...

A view can be used to perform calculations automatically. In Chapter 1, we developed a statement that determines the age of presidents at death. The same calculation can be incorporated into a view definition:

mysql> CREATE VIEW pres_age AS
    ->  SELECT last_name, first_name, birth, death,
    ->  (YEAR(death) - YEAR(birth))
    ->   - IF(RIGHT(death,5) < RIGHT(birth,5),1,0)
    ->  AS age
    ->  FROM president;

This view includes an age column that is defined as a calculation, and selecting that column from the view retrieves the results of the calculation:

mysql> SELECT * FROM pres_age;
+------------+---------------+------------+------------+------+
| last_name  | first_name    | birth      | death      | age  |
+------------+---------------+------------+------------+------+
| Washington | George        | 1732-02-22 | 1799-12-14 |  67  |
| Adams      | John          | 1735-10-30 | 1826-07-04 |  90  |
| Jefferson  | Thomas        | 1743-04-13 | 1826-07-04 |  83  |
| Madison    | James         | 1751-03-16 | 1836-06-28 |  85  |
| Monroe     | James         | 1758-04-28 | 1831-07-04 |  73  |
...

By including the age calculation in the view definition, it's no longer necessary to write out the formula to see the age values. The view hides the details.

A view can refer to multiple tables, which makes it easier to run queries that involve joins. The following view looks up scores, joining them with student and grade event information:

mysql> CREATE VIEW vstudent AS
    -> SELECT student.student_id, name, date, score, category
    -> FROM grade_event, score, student
    -> WHERE
    -> grade_event.event_id = score.event_id
    -> AND score.student_id = student.student_id;

When you select from the view, MySQL executes the join and returns information from multiple tables:

mysql> SELECT * FROM vstudent;
+------------+-----------+------------+-------+----------+
| student_id | name      | date       | score | category |
+------------+-----------+------------+-------+----------+
|          1 | Megan     | 2004-09-03 |  20   | Q        |
|          3 | Kyle      | 2004-09-03 |  20   | Q        |
|          4 | Katie     | 2004-09-03 |  18   | Q        |
|          5 | Abby      | 2004-09-03 |  13   | Q        |
|          6 | Nathan    | 2004-09-03 |  18   | Q        |
|          7 | Liesl     | 2004-09-03 |  14   | Q        |
|          8 | Ian       | 2004-09-03 |  14   | Q        |
...

The view makes it trivial to retrieve the scores for a particular student by name:

mysql> SELECT * FROM vstudent WHERE name = 'emily';
+------------+-------+------------+-------+----------+
| student_id | name  | date       | score | category |
+------------+-------+------------+-------+----------+
|         31 | Emily | 2004-09-03 |    11 | Q        |
|         31 | Emily | 2004-09-06 |    19 | Q        |
|         31 | Emily | 2004-09-09 |    81 | T        |
|         31 | Emily | 2004-09-16 |    19 | Q        |
|         31 | Emily | 2004-09-23 |     9 | Q        |
|         31 | Emily | 2004-10-01 |    76 | T        |
+------------+-------+------------+-------+----------+

Some views are updatable, which means that you can insert, update, and delete rows in the underlying table by means of operations on the view. Here is a simple example:

mysql> CREATE TABLE t (i INT);
mysql> INSERT INTO t (i) VALUES(1),(2),(3);
mysql> CREATE VIEW v AS SELECT i FROM t;
mysql> SELECT i FROM v;
+------+
| i    |
+------+
|  1   |
|  2   |
|  3   |
+------+
mysql> INSERT INTO v (i) VALUES(4);
mysql> DELETE FROM v WHERE i < 3;
mysql> SELECT i FROM v;
+------+
| i    |
+------+
|  3   |
|  4   |
+------+
mysql> UPDATE v SET i = i + 1;
mysql> SELECT i FROM v;
+------+
| i    |
+------+
|  4   |
|  5   |
+------+

For a view to be updatable, it must map directly onto a single table, it must select only columns that are simple references to table columns (not arbitrary expressions), and any operation on a view row must correspond to an operation on a single row in the underlying table. For example, if a view involves a summary calculated using an aggregate function, each view row can be based on multiple underlying table rows. In this case, the view is not updatable because there is no way to tell which underlying table row should be updated.

Using Stored Procedures

Beginning with MySQL 5.0.0, it is possible to define stored routines. These are functions and procedures that are stored on the server side and that can be invoked later. Stored functions return a result and can be included in expressions just like built-in functions. Stored procedures do not return a result directly. However, they support parameter types that can have their values set in the body of the procedure such that the values can be accessed by the caller after the procedure finishes.

Generally, you use a function to calculate a value to be returned to the caller for use in expressions. Use a procedure if you need only to invoke a routine to produce an effect or action without returning a value. These are guidelines, not hard and fast rules. For example, if you need to return more than one value, you cannot use a function. But you can define a procedure that has OUT parameters that can be used in expressions after the procedure returns.

Stored routines provide the following benefits and capabilities:

To create a stored function or procedure, use a CREATE FUNCTION or CREATE PROCEDURE statement. The following example creates a function. In the preceding section on views, we defined a view that included a column that performed an age calculation for the rows in the president table. If you want to use such a calculation more generally than in the context of a particular view, define the calculation as a function so that you can invoke it in arbitrary expressions.

mysql> delimiter $
mysql> CREATE FUNCTION age (date1 DATE, date2 DATE)
    -> RETURNS INT
    -> BEGIN
    ->  DECLARE age INT;
    ->  SET age = (YEAR(date2) - YEAR(date1))
    ->       - IF(RIGHT(date2,5) < RIGHT(date1,5),1,0);
    ->  RETURN age;
    -> END$
mysql> delimiter ;

The function begins with a RETURNS clause to indicate the data type of its return value. This is followed by the function body. Because the body consists of multiple statements, they're enclosed within a BEGIN/END block. The body declares a local variable, assigns the result of the calculation to it, and returns it as the function value. A function body must include at least one RETURN statement to return a value to the caller. (This function could have been written more concisely to use return the calculation result directly from the RETURN statement, but I wanted to illustrate variable declaration and use.)

The age() function takes two DATE arguments and can be invoked like any built-in function:

mysql> SELECT first_name, last_name, birth, death, age (birth, death)
    -> FROM president;
+---------------+------------+------------+------------+-------------------+
| first_name    | last_name  | birth      | death      | age(birth, death) |
+---------------+------------+------------+------------+-------------------+
| George        | Washington | 1732-02-22 | 1799-12-14 |                67 |
| John          | Adams      | 1735-10-30 | 1826-07-04 |                90 |
| Thomas        | Jefferson  | 1743-04-13 | 1826-07-04 |                83 |
| James         | Madison    | 1751-03-16 | 1836-06-28 |                85 |
| James         | Monroe     | 1758-04-28 | 1831-07-04 |                73 |
...

Actually, there is one difference between how you refer to stored functions and built-in functions. For a stored function, the opening parenthesis need not immediately follow the function name. In fact, if you give a stored function the same name as a built-in function, you must have a space between the name and the parenthesis (both when you define the function and when you invoke it).

In the preceding example, notice the delimiter commands surrounding the function definition statement. If a routine body consists of a single statement, you can write it at the end of the CREATE FUNCTION or CREATE PROCEDURE statement with no special treatment. However, if a routine body consists of multiple statements, each must be terminated by semicolon (';'). That also is the default statement terminator for the mysql program, which introduces a conflict. To deal with this, use the delimiter command to redefine mysql's statement delimiter to a character or string that does not appear in the routine definition. That causes mysql not to interpret semicolons as terminators and to pass the entire definition to the server as a single statement. You can redefine the terminator to semicolon again after defining the routine.

A stored procedure is similar to a stored function, but it doesn't return a value. Therefore, it does not include a RETURNS clause or any RETURN statements. Here is a simple stored procedure that takes one argument, a year, and displays information for presidents born in that year:

mysql> DROP PROCEDURE IF EXISTS born_in_year;
mysql> CREATE PROCEDURE born_in_year (year_of_birth INT)
    -> SELECT first_name, last_name, birth, death
    -> FROM president
    -> WHERE YEAR(birth) = year_of_birth;

In this case, the body of the routine consists of a single statement, so no delimiter commands are necessary. The procedure issues a SELECT statement. The results of this statement are not returned as the procedure value, but instead are passed as a result set to the client. To see this, invoke the procedure, which is done by using CALL:

mysql> CALL born_in_year(1908);
+------------+-----------+------------+------------+
| first_name | last_name | birth      | death      |
+------------+-----------+------------+------------+
| Lyndon B.  | Johnson   | 1908-08-27 | 1973-01-22 |
+------------+-----------+------------+------------+
mysql> CALL born_in_year(1913);
+------------+-----------+------------+------------+
| first_name | last_name | birth      | death      |
+------------+-----------+------------+------------+
| Richard M  | Nixon     | 1913-01-09 | 1994-04-22 |
| Gerald R   | Ford      | 1913-07-14 | NULL       |
+------------+-----------+------------+------------+

The example illustrates one thing that stored procedures can do that stored functions in MySQL currently cannot: Procedures can access tables. This prohibition on referring to tables in stored functions presumably will be lifted in the future. As a workaround for this limitation on functions, you can define a procedure that performs the required table operation, and include in the definition an OUT or INOUT parameter to send back as the procedure "value" when the procedure returns. This technique also can be useful if you need to return more than one value, because functions return only a single value.

By default, a procedure parameter is an IN parameter. The caller passes a value, and it can be modified within the procedure, but any changes are not visible to the caller after the procedure returns. An OUT parameter is the opposite. The procedure assigns a value to the parameter, which can be accessed by the caller after the procedure returns. An INOUT parameter allows the caller to pass in a value, and to get a value back.

The following example is similar to the preceding one, but instead of displaying columns from president table rows, it simply counts the number of rows and assigns the count to an OUT parameter. The caller gets the count from that parameter after invoking the procedure.

mysql> delimiter $
mysql> CREATE PROCEDURE count_born_in_year
    ->  (year_of_birth INT, OUT how_many INT)
    -> BEGIN
    ->  DECLARE c CURSOR FOR
    ->   SELECT COUNT(*) FROM president WHERE YEAR(birth) = year_of_birth;
    ->  OPEN c;
    ->  FETCH c INTO how_many;
    ->  CLOSE c;
    -> END$
mysql> delimiter ;
mysql> CALL count_born_in_year(1908, @count);
mysql> SELECT @count;
+--------+
| @count |
+--------+
| 1      |
+--------+
mysql> CALL count_born_in_year(1913, @count);
mysql> SELECT @count;
+--------+
| @count |
+--------+
| 2      |
+--------+

The procedure does not simply invoke the SELECT statement and assign the value of the COUNT(*) column to a variable. If it did that, the result of the statement would be displayed to the client directly. To suppress this display, the procedure sets up a cursor and uses it to execute the SELECT statement. This way the result of the SELECT goes to the procedure itself so that it can process the result directly.

To use a cursor, declare it to associate it with the statement to be executed. Then open the cursor, fetch the rows of the statement, and close the cursor. In many applications, the FETCH statement will appear inside some kind of a loop construct such as a REPEAT statement. However, in the count_born_in_year() procedure, the SELECT returns only a single row and there is no need for a loop.

To invoke the procedure, pass a year value as the first parameter, and provide a variable into which the count should be placed as the second parameter. After the procedure returns, the variable value will contain the count.

Using Triggers

Trigger support is available as of MySQL 5.0.2. A trigger can be defined to activate for INSERT, DELETE, or UPDATE statements for a table, and can be set to activate either before or after each row processed by the statement. The trigger definition includes a statement that executes when the trigger activates. Currently, there are no trigger-specific privileges, so you must have the SUPER privilege to define a trigger.

Triggers in MySQL currently have the same limitation as stored functions that they cannot refer to tables in general. They can refer only to the values in the current row being modified in the table that is associated with the trigger. This lessens the usefulness of triggers for certain applications. For example, you cannot perform change logging by recording deletes, inserts, and updates for a table to maintain a change history. Nevertheless, even in their current early implementation, triggers offer some important features:

The following example shows a trigger for INSERT statements for a table t. It is a BEFORE trigger so that it can examine data values before they are inserted into the table. The trigger performs two actions:

mysql> CREATE TABLE t (i INT, dt DATETIME);
mysql> delimiter $
mysql> CREATE TRIGGER t_ins BEFORE INSERT ON t
    ->  FOR EACH ROW BEGIN
    ->   SET NEW.dt = CURRENT_TIMESTAMP;
    ->   IF NEW.i < 0 THEN SET NEW.i = 0; END IF;
    ->  END$
mysql> delimiter ;
mysql> INSERT INTO t (i) VALUES(-2),(0),(2);
mysql> SELECT * FROM t;
+------+---------------------+
| i    | dt                  |
+------+---------------------+
|  0   | 2005-01-23 12:14:11 |
|  0   | 2005-01-23 12:14:11 |
|  2   | 2005-01-23 12:14:11 |
+------+---------------------+

In the same way as for stored routines, the body of a trigger can contain multiple statements if you enclose them within a BEGIN/END block. In this case, you'll need to change the default delimiter so that you can use ';' as the delimiter within the trigger definition. The preceding example uses this technique.

The syntax NEW.col_name can be used in the trigger body to refer to columns in the new row to be inserted or updated in an INSERT or UPDATE trigger. Similarly, OLD.col_name can be used to refer to columns in the old row to be deleted or updated in a DELETE or UPDATE trigger. OLD and NEW are not case sensitive.

Triggers can refer to user variables, so you can set a variable within a trigger to pass a result outside of the trigger. This gives you a way to check what effect the trigger had.

800 East 96th Street, Indianapolis, Indiana 46240