Home > Articles > Data > MySQL

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

Performing Transactions

A transaction is a set of SQL statements that execute as a unit. Either all the statements execute successfully, or none of them have any effect. This is achieved through the use of commit and rollback capabilities. If all of the statements in the transaction succeed, you commit it to record their effect permanently in the database. If an error occurs during the transaction, you roll it back to cancel it. Any statements executed up to that point within the transaction are undone, leaving the database in the state it was in prior to the point at which the transaction began.

Commit and rollback provide the means for ensuring that halfway-done operations don't make their way into your database and leave it in a partially updated (inconsistent) state. The canonical example of this involves a financial transfer where money from one account is placed into another account. Suppose that Bill writes a check to Bob for $100.00 and Bob cashes the check. Bill's account should be decremented by $100.00 and Bob's account incremented by the same amount:

UPDATE account SET balance = balance - 100 WHERE name = 'Bill';
UPDATE account SET balance = balance + 100 WHERE name = 'Bob';

If a crash occurs between the two statements, the operation is incomplete. Depending on which statement executes first, Bill is $100 short without Bob having been credited, or Bob is given $100 without Bill having been debited. Neither outcome is correct. If transactional capabilities are not available to you, you have to figure out the state of ongoing operations at crash time by examining your logs manually in order to determine how to undo them or complete them. The rollback capabilities of transaction support allow you to handle this situation properly by undoing the effect of the statements that executed before the error occurred. (You may still have to determine which transactions weren't entered and re-issue them, but at least you don't have to worry about half-transactions making your database inconsistent.)

Another use for transactions is to make sure that the records involved in an operation are not modified by other clients while you're working with them. MySQL automatically performs locking for single SQL statements to keep clients from interfering with each other, but this is not always sufficient to guarantee that a database operation achieves its intended result, because some operations are performed over the course of several statements. In this case, different clients might interfere with each other. A transaction groups statements into a single execution unit to prevent concurrency problems that could otherwise occur in a multiple-client environment.

Transactional systems typically are characterized as providing ACID properties. ACID is an acronym for Atomic, Consistent, Isolated, and Durable, referring to four properties that transactions should have:

  • Atomicity: The statements a transaction consists of form a logical unit. You can't have just some of them execute.

  • Consistency: The database is consistent before and after the transaction executes. In other words, the transaction doesn't make a mess of your database.

  • Isolation: One transaction has no effect on another.

  • Durability: When a transaction executes successfully to completion, its effects are recorded permanently in the database.

Transactional processing provides stronger guarantees about the outcome of database operations, but also requires more overhead in CPU cycles, memory, and disk space. MySQL offers some storage engines that are transaction-safe (such as InnoDB and BDB), and some that are not transaction-safe (such as MyISAM and MEMORY). Transactional properties are essential for some applications and not for others, and you can choose which ones make the most sense for your applications. Financial operations typically need transactions, and the guarantees of data integrity outweigh the cost of additional overhead. On the other hand, for an application that logs web page accesses to a database table, a loss of a few records if the server host crashes might be tolerable. In this case, you can use a non-transactional storage engine to avoid the overhead required for transactional processing.

Using Transactions to Ensure Safe Statement Execution

To use transactions, you must use a transactional storage engine. This means using either InnoDB or BDB tables. Engines such as MyISAM and MEMORY will not work. If you're not sure whether your MySQL server supports the InnoDB or BDB storage engines, see "Checking Which Storage Engines Are Available" earlier in the chapter.

By default, MySQL runs in autocommit mode, which means that changes made by individual statements are committed to the database immediately to make them permanent. In effect, each statement is its own transaction implicitly. To perform transactions explicitly, disable autocommit mode and then tell MySQL when to commit or roll back changes.

One way to perform a transaction is to issue a START TRANSACTION statement to suspend autocommit mode, execute the statements that make up the transaction, and end the transaction with a COMMIT statement to make the changes permanent. If an error occurs during the transaction, cancel it by issuing a ROLLBACK statement instead to undo the changes. START TRANSACTION suspends the current autocommit mode, so after the transaction has been committed or rolled back, the mode reverts to its state prior to the START TRANSACTION. (If autocommit was enabled beforehand, ending the transaction puts you back in autocommit mode. If it was disabled, ending the current transaction causes you to begin the next one.)

The following example illustrates this approach. First, create a table to use:

mysql> CREATE TABLE t (name CHAR(20), UNIQUE (name)) ENGINE = InnoDB;

The statement creates an InnoDB table, but you can use BDB if you like. Next, initiate a transaction with START TRANSACTION, add a couple of rows to the table, commit the transaction, and then see what the table looks like:

mysql> START TRANSACTION;
mysql> INSERT INTO t SET name = 'William';
mysql> INSERT INTO t SET name = 'Wallace';
mysql> COMMIT;
mysql> SELECT * FROM t;
+---------+
| name    |
+---------+
| Wallace |
| William |
+---------+

You can see that the rows have been recorded in the table. If you had started up a second instance of mysql and selected the contents of t after the inserts but before the commit, the rows would not show up. They would not become visible to the second mysql process until the COMMIT statement had been issued by the first one.

If an error occurs during a transaction, you can cancel it with ROLLBACK. Using the t table again, you can see this by issuing the following statements:

mysql> START TRANSACTION;
mysql> INSERT INTO t SET name = 'Gromit';
mysql> INSERT INTO t SET name = 'Wallace';
ERROR 1062 (23000): Duplicate entry 'Wallace' for key 1
mysql> ROLLBACK;
mysql> SELECT * FROM t;
+---------+
| name    |
+---------+
| Wallace |
| William |
+---------+

The second INSERT attempts to place a row into the table that duplicates an existing name value. The statement fails because name has a UNIQUE index. After issuing the ROLLBACK, the table has only the two rows that it contained prior to the failed transaction. In particular, the INSERT that was performed just prior to the point of the error has been undone and its effect is not recorded in the table.

Issuing a START TRANSACTION statement while a transaction is in process commits the current transaction implicitly before beginning a new one.

Note: For older versions of MySQL that do not recognize START TRANSACTION, use BEGIN instead.

Another way to perform transactions is to manipulate the autocommit mode directly using SET statements:

SET autocommit = 0;
SET autocommit = 1;

Setting the autocommit variable to zero disables autocommit mode. The effect of any statements that follow becomes part of the current transaction, which you end by issuing a COMMIT or ROLLBACK statement to commit or cancel it. With this method, autocommit mode remains off until you turn it back on, so ending one transaction also begins the next one. You can also commit a transaction by re-enabling autocommit mode.

To see how this approach works, begin with the same table as for the previous examples:

mysql> DROP TABLE t;
mysql> CREATE TABLE t (name CHAR(20), UNIQUE (name)) ENGINE = InnoDB;

Then disable autocommit mode, insert some records, and commit the transaction:

mysql> SET autocommit = 0;
mysql> INSERT INTO t SET name = 'William';
mysql> INSERT INTO t SET name = 'Wallace';
mysql> COMMIT;
mysql> SELECT * FROM t;
+---------+
| name    |
+---------+
| Wallace |
| William |
+---------+

At this point, the two records have been committed to the table, but autocommit mode remains disabled. If you issue further statements, they become part of a new transaction, which may be committed or rolled back independently of the first transaction. To verify that autocommit is still off and that ROLLBACK will cancel uncommitted statements, issue the following statements:

mysql> INSERT INTO t SET name = 'Gromit';
mysql> INSERT INTO t SET name = 'Wallace';
ERROR 1062 (23000): Duplicate entry 'Wallace' for key 1
mysql> ROLLBACK;
mysql> SELECT * FROM t;
+---------+
| name  |
+---------+
| Wallace |
| William |
+---------+

To re-enable autocommit mode, use this statement:

mysql> SET autocommit = 1;

As just described, a transaction ends when you issue a COMMIT or ROLLBACK statement, or when you re-enable autocommit while it is disabled. Transactions also end under the following circumstances:

  • In addition to the SET autocommit, START TRANSACTION, BEGIN, COMMIT, and ROLLBACK statements that affect transactions explicitly, certain other statements do so implicitly because they cannot be part of a transaction. If you issue any of the following statements while a transaction is in progress, the server commits the transaction first before executing the statement. The following statements cause a commit:

  • ALTER TABLE
    CREATE INDEX
    DROP DATABASE
    DROP INDEX
    DROP TABLE
    LOAD MASTER DATA
    LOCK TABLES
    RENAME TABLE
    TRUNCATE TABLE
    UNLOCK TABLES (if tables currently are locked)
  • If a client currently is performing a transaction and the client connection ends or is broken before a commit occurs, the server rolls back the transaction automatically.

If your client program automatically reconnects when the connection to the server is lost, the connection will be reset to its default state of having autocommit enabled.

Transactions are useful in all kinds of situations. Suppose that you're working with the score table that is part of the grade-keeping project and you discover that the grades for two students have gotten mixed up and need to be switched. The incorrectly entered grades are as follows:

mysql> SELECT * FROM score WHERE event_id = 5 AND student_id IN (8,9);
+------------+----------+-------+
| student_id | event_id | score |
+------------+----------+-------+
|          8 |        5 |    18 |
|          9 |        5 |    13 |
+------------+----------+-------+

To fix this, student 8 should be given a score of 13 and student 9 a score of 18. That can be done easily with two statements:

UPDATE score SET score = 13 WHERE event_id = 5 AND student_id = 8;
UPDATE score SET score = 18 WHERE event_id = 5 AND student_id = 9;

However, it's necessary to ensure that both statements succeed as a unit. This is a problem to which transactional methods may be applied. To use START TRANSACTION, do this:

mysql> START TRANSACTION;
mysql> UPDATE score SET score = 13 WHERE event_id = 5 AND student_id = 8;
mysql> UPDATE score SET score = 18 WHERE event_id = 5 AND student_id = 9;
mysql> COMMIT;

To accomplish the same thing by manipulating the autocommit mode explicitly instead, do this:

mysql> SET autocommit = 0;
mysql> UPDATE score SET score = 13 WHERE event_id = 5 AND student_id = 8;
mysql> UPDATE score SET score = 18 WHERE event_id = 5 AND student_id = 9;
mysql> COMMIT;
mysql> SET autocommit = 1;

Either way, the result is that the scores are swapped properly:

mysql> SELECT * FROM score WHERE event_id = 5 AND student_id IN (8,9);
+------------+----------+-------+
| student_id | event_id | score |
+------------+----------+-------+
|          8 |        5 |    13 |
|          9 |        5 |    18 |
+------------+----------+-------+

Using Transaction Savepoints

As of MySQL 4.1.1, it is possible to perform a partial rollback of a transaction. To do this, issue a SAVEPOINT statement to set a marker in the transaction. To roll back to just that point in the transaction later, use a ROLLBACK statement that names the savepoint. The following statements illustrate how this works:

mysql> CREATE TABLE t (i INT) ENGINE = INNODB;
mysql> START TRANSACTION;
mysql> INSERT INTO t VALUES(1);
mysql> SAVEPOINT my_savepoint;
mysql> INSERT INTO t VALUES(2);
mysql> ROLLBACK TO SAVEPOINT my_savepoint;
mysql> INSERT INTO t VALUES(3);
mysql> COMMIT;
mysql> SELECT * FROM t;
+------+
| i    |
+------+
|  1   |
|  3   |
+------+

After executing these statements, the first and third records have been inserted, but the second one has been canceled by the partial rollback to the my_savepoint savepoint.

Transaction Isolation

Because MySQL is a multiple-user database system, different clients can attempt to use any given table at the same time. Storage engines such as MyISAM use table locking to keep clients from modifying a table at the same time, but this does not provide good concurrency performance when there are many updates. The InnoDB storage takes a different approach. It uses row-level locking for finer-grained control over table access by clients. One client can modify a row at the same time that another client reads or modifies a different row in the same table. If both clients want to modify a row at the same time, whichever of them acquires a lock on the row gets to modify it first. This provides better concurrency than table locking. However, there is the question about whether one client's transaction should be able to see the changes made by another client's transaction.

InnoDB implements transaction isolation levels to give clients control over what kind of changes made by other transactions they want to see. Different isolation levels allow or prevent the various problems that can occur when different transactions run simultaneously:

  • Dirty reads. A dirty read occurs when a change made by one transaction can be seen by other transactions before the transaction has been committed. Another transaction thus might think the row has been changed, even though that will not really be true if the transaction that changed the row later is rolled back.

  • Nonrepeatable reads. A nonrepeatable read refers to the failure by a transaction to get the same result for a given SELECT statement each time it executes it. This might happen if one transaction performs a SELECT twice but another transaction changes some of the rows in between the two executions.

  • Phantom rows. A phantom is a row that becomes visible to a transaction when it was not previously. Suppose that a transaction performs a SELECT and then another transaction inserts a row. If the first transaction runs the same SELECT again and sees the new row, that is a phantom.

To deal with these problems, InnoDB provides four transaction isolation levels. These levels determine which modifications made by one transaction can be seen by other transactions that execute at the same time:

  • READ UNCOMMITTED

  • A transaction can see row modifications made by other transactions even before they have been committed.

  • READ COMMITTED

  • A transaction can see row modifications made by other transactions only if they were committed before the transaction began.

  • REPEATABLE READ

  • If a transaction performs a given SELECT twice, the result is repeatable. That is, it gets the same result each time, even if other transactions have changed or inserted rows in the meantime.

  • SERIALIZABLE

  • This isolation level is similar to REPEATABLE READ but isolates transactions more completely: Rows examined by one transaction cannot be modified by other transactions until the first transaction completes. This allows one transaction to read rows and at the same time prevent them from being modified by other transactions until it is done with them.

Table 2.3 shows for each isolation level whether they allow dirty reads, nonrepeatable reads, or phantom rows. The table is InnoDB-specific in that REPEATABLE READ does not allow phantom rows to occur. Some database systems do allow phantoms at the REPEATABLE READ isolation level.

Table 2.3 Problems Allowed by Isolation Levels

Isolation Level

Dirty Reads

Nonrepeatable Reads

Phantom Rows

READ UNCOMMITTED

Yes

Yes

Yes

READ COMMITTED

No

Yes

Yes

REPEATABLE READ

No

No

No

SERIALIZABLE

No

No

No


The default InnoDB isolation level is REPEATABLE READ. This can be changed at server startup with the --transaction-isolation option, or at runtime with the SET TRANSACTION statement. The statement has three forms:

SET GLOBAL TRANSACTION ISOLATION LEVEL level
SET SESSION TRANSACTION ISOLATION LEVEL level
SET TRANSACTION ISOLATION LEVEL level

A client that has the SUPER privilege can use SET TRANSACTION to change the global isolation level, which then applies to any clients that connect thereafter. In addition, any client can change its own transaction isolation level, either for all subsequent transactions within its session with the server or for just its next transaction. No special privileges are required for the client-specific levels.

Non-Transactional Approaches to Transactional Problems

In a non-transactional environment, some transactional issues can be dealt with and some cannot. The following discussion covers what can and cannot be achieved without using transactions. You can use this information to determine whether an application can employ the techniques here and avoid the overhead of transaction-safe tables.

First, let's consider how concurrency problems can occur when multiple clients attempt to make changes to a database using operations that each require several statements. Suppose that you're in the garment sales business and your cash register software automatically updates your inventory levels whenever one of your salesmen processes a sale. The sequence of events shown here outlines the operations that take place when multiple sales occur. For the example, assume that the initial shirt inventory level is 47.

  1. Salesman A sells three shirts and registers the sale. The register software begins to update the database by selecting the current shirt count (47):

  2. SELECT quantity FROM inventory WHERE item = 'shirt';
  3. In the meantime, Salesman B has sold two shirts and registered the sale. The software at the second register also begins to update the database:

  4. SELECT quantity FROM inventory WHERE item = 'shirt';
  5. The first register computes the new inventory level to be 47–3 = 44 and updates the shirt count accordingly:

  6. UPDATE inventory SET quantity = 44 WHERE item = 'shirt';
  7. The second register computes the new inventory level to be 47–2 = 45 and updates the count:

  8. UPDATE inventory SET quantity = 45 WHERE item = 'shirt';

At the end of this sequence of events, you've sold five shirts. That's good. However, the inventory level says 45. That's bad, because it should be 42. The problem is that if you look up the inventory level in one statement and update the value in another statement, you have a multiple-statement operation. The action taken in the second statement is dependent on the value retrieved in the first. If separate multiple-statement operations occur during overlapping time frames, the statements from each operation intertwine and interfere with each other. To solve this problem, it's necessary that the statements for a given operation execute without interference from other operations. A transactional system ensures this by executing each salesman's statements as a unit and isolating them from each other. As a result, Salesman B's statements won't execute until those for Salesman A have completed.

To deal with the concurrency issues inherent in the situation just described, you can take a couple of approaches:

  • Lock the tables explicitly. You can group statements and execute them as a unit by surrounding them with LOCK TABLES and UNLOCK TABLES statements: Lock all the tables that you need to use, issue your statements, and release the locks. This prevents anyone else from changing the tables while you have them locked. Using table locking, the inventory update scenario might be handled like this:

    1. Salesman A sells three shirts and registers the sale. The register software begins the inventory process by acquiring a table lock and retrieving the current shirt count (47):

    2. LOCK TABLES inventory WRITE;
      SELECT quantity FROM inventory WHERE item = 'shirt';

      A WRITE lock is necessary here because the ultimate goal of the operation is to modify the inventory table, which involves writing to it.

    3. In the meantime, Salesman B has sold two shirts and registered the sale. The software at the second register also begins to update the database by trying to acquire a lock:

    4. LOCK TABLES inventory WRITE;

      In this case, this statement blocks because Salesman A already holds a lock on the table.

    5. The first register computes the new inventory level to be 47–3 = 44, updates the shirt count, and releases the lock:

    6. UPDATE inventory SET quantity = 44 WHERE item = 'shirt';
      UNLOCK TABLES;
    7. When the first register releases the lock, the second register's lock request succeeds, and it can proceed to retrieve the current shirt count (44):

    8. SELECT quantity FROM inventory WHERE item = 'shirt';
    9. The second register computes the new inventory level to be 44–2 = 42, updates the shirt count, and releases the lock:

    10. UPDATE inventory SET quantity = 42 WHERE item = 'shirt';
      UNLOCK TABLES;

    Now the statements from the two operations don't get mixed up and the inventory level is set properly.

    If you're using multiple tables, you must lock all of them before you execute the grouped statements. If you only read from a particular table, however, you need only a read lock on it, not a write lock. (This lets other clients read the tables while you're using them, but prevents clients from writing to them.) Suppose that you have a set of queries in which you want to make some changes to the inventory table, and you also need to read some data from a customer table. In this case, you need a write lock on the inventory table and a read lock on the customer table:

    LOCK TABLES inventory WRITE, customer READ;
    ... use the tables here ...
    UNLOCK TABLES;
  • Use relative updates, not absolute updates. For the inventory updating method that uses explicit table locking, the operation involves looking up the current inventory level with one statement, computing the new value based on the number of shirts sold, and then updating the level to the new value with another statement. Another way to keep operations performed by multiple clients from interfering with each other is to reduce each operation to a single statement. This eliminates inter-statement dependencies that arise in multiple-statement operations. Not every operation can be handled by a single statement, but for the inventory update scenario, this strategy works well. It's possible to perform each inventory update in one step simply by modifying the shirt count relative to its current value:

    1. Salesman A sells three shirts and the register software decrements the shirt count by three:

    2. UPDATE inventory SET quantity = quantity - 3 WHERE item = 'shirt';
    3. Salesman B sells two shirts and the register software decrements the shirt count by two:

    4. UPDATE inventory SET quantity = quantity - 2 WHERE item = 'shirt';

    With this method, each modification to the database no longer requires multiple statements. This eliminates concurrency issues, so there is no need to use explicit table locks. If an operation you want to perform is similar to this, there may be no need for transactions at all.

The non-transactional approaches just described can be applied successfully to many types of problems, but they have certain limitations:

  • Not every operation can be written in terms of relative updates. Sometimes you must use multiple statements, in which case concurrency issues must be considered and dealt with.

  • You may be able to keep clients from interfering with each other by locking tables for the duration of a multiple-statement operation, but what happens if an error occurs in the middle of the operation? In this case, you'd want the effects of the earlier statements to be undone so that the database isn't left in a half-modified and inconsistent state. Unfortunately, although table locking can help you address concurrency issues, non-transactional tables provide no assistance in recovering from errors.

  • The locking strategy requires you to lock and unlock your tables yourself. If you revise an operation in such a way that changes the set of tables used, you must remember to modify the LOCK TABLES statement accordingly.

If any of these issues are significant for your applications, you should use transaction-safe tables instead, because transactional capabilities help you deal with each issue. A transaction handler executes a set of statements as a unit and manages concurrency issues by preventing clients from getting in the way of each other. It also allows rollback in the case of failure to keep half-executed operations from damaging your database, and it determines which locks are necessary and acquires them automatically.

  • + Share This
  • 🔖 Save To Your Account