- Using Indexing
- The MySQL Query Optimizer
- Data Type Choices and Query Efficiency
- Loading Data Efficiently
- Scheduling and Locking Issues
- Optimization for Administrators
Optimization for Administrators
The previous sections describe optimizations that can be performed by unprivileged MySQL users. Administrators who have control of the MySQL server or the machine on which it runs can perform additional optimizations. For example, some server parameters pertain to query processing and may be tuned, and certain hardware configuration factors have a direct effect on query processing speed. In many cases, these optimizations improve the performance of the server as a whole, and thus have a beneficial effect for all MySQL users.
In general, you should keep the following principles in mind when performing administrative optimizations:
Accessing data in memory is faster than accessing data from disk.
Keeping data in memory as long as possible reduces disk activity.
Retaining information from an index is more important than retaining contents of data records.
Specific ways you can apply these principles are discussed next.
Increase the size of the server's caches. The server has many parameters (system variables) that you can change to affect its operation. Several of these directly affect the speed of query processing. The most important parameters you can change are the sizes of the table cache and the caches used by the storage engines to buffer information for indexing operations. If you have memory available, allocating it to the server's caches allows information to be held in memory longer and reduces disk activity. This is good, because it's much faster to access information from memory than to read it from disk.
When the server opens table files, it tries to keep them open so as to minimize the number of file-opening operations. To do this, it maintains information about open files in the table cache. The table_cache system variable controls the size of this cache. If the server accesses lots of tables, the table cache fills up and the server closes tables that haven't been used for a while to make room for opening new tables. You can assess how effective the table cache is by checking the Opened_tables status indicator:
SHOW STATUS LIKE 'Opened_tables';
Opened_tables indicates the number of times a table had to be opened because it wasn't already open. (This value is also displayed as the Opens value in the output of the mysqladmin status command.) If the number remains stable or increases slowly, it's probably set to about the right value. If the number grows quickly, it means the cache is too small and that tables often have to be closed to make room to open other tables. If you have file descriptors available, increasing the table cache size will reduce the number of table opening operations.
The key buffer is used by the MyISAM storage engine to hold index blocks for index-related operations. Its size is controlled by the key_buffer_size system variable. Larger values allow MySQL to hold more index blocks in memory at once, which increases the likelihood of finding key values in memory without having to read a new block from disk. The default size of the key buffer is 8MB. If you have lots of memory, that's a very conservative value and you should be able to increase it substantially and see a considerable improvement in performance for index-based retrievals and for index creation and modification operations.
In MySQL 4.1 and up, you can create additional key caches for MyISAM tables and assign specific tables to them. This can help query processing for those tables, as explained in "Using MyISAM Key Caches."
The InnoDB and BDB engines have their own caches for buffering data and index values. The sizes are controlled by the innodb_buffer_pool_size and bdb_cache_size variables. The InnoDB engines also maintains a log buffer. The innodb_log_buffer_size variable controls its size.
Another special cache is the query cache, described later in "Using the Query Cache."
Instructions for setting system variables may be found in Chapter 11, "General MySQL Administration." When you change parameter values, follow these guidelines:
Change one parameter at a time. Otherwise, you're varying multiple independent variables and it becomes more difficult to assess the effect of each change.
Increase system variable values incrementally. If you increase a variable by a huge amount on the theory that more is always better, you may run your system out of resources, causing it to thrash or slow to a crawl because you've set the value too high.
Rather than experimenting with parameter tuning on your production MySQL server, it might be prudent to set up a separate test server.
To get an idea of the kinds of parameter variables that are likely to be appropriate for your system, take a look at the my-small.cnf, my-medium.cnf, my-large.cnf, and my-huge.cnf option files included with MySQL distributions. (On Unix, you can find them under the support-files directory in source distributions and under the share directory in binary distributions. On Windows, they are located in the base installation directory, and the filename suffix might be .ini.) These files will give you some idea of which parameters are best to change for servers that receive different levels of use, and also some representative values to use for those parameters.
Other strategies you can adopt to help the server operate more efficiently include the following:
Disable storage engines that you don't need. The server won't allocate any memory for disabled engines, allowing you to devote it to other uses. Most storage engines can be excluded from the server binary at configuration time if you build MySQL from source. For those engines that are included in the server, many can be disabled at runtime with the appropriate startup options. See "Selecting Storage Engines," in Chapter 11, "General MySQL Administration" for details.
Keep grant table permissions simple. Although the server caches grant table contents in memory, if you have any rows in the tables_priv or columns_priv tables, the server must check table- and column-level privileges for every query. If those tables are empty, the server can optimize its privilege checking to skip those levels.
If you build MySQL from source, configure it to use static libraries rather than shared libraries. Dynamic binaries that use shared libraries save on disk space, but static binaries are faster. However, some systems require dynamic linking if you use the user-defined function (UDF) mechanism. On such systems, static binaries will not work.
Using MyISAM Key Caches
When MySQL executes a statement that uses indexes from MyISAM tables, it uses a key cache to hold index values. The cache allows disk I/O to be reduced: If key values needed from a table are found in the cache, they need not be read from disk again. Unfortunately, the key cache is a finite resource and it is shared among all MyISAM tables by default. If key values are not found in the cache and the cache is full, contention results: Some values currently in the cache must be discarded to make room for new values. The next time the discarded values are needed, they must be read from disk again.
If you have an especially heavily used MyISAM table, it would be nice to ensure that its keys remain in memory, but contention in the cache works against this. Contention can arise either when keys need to be read from the same table, or from other tables. You might avoid same-table contention by making the key cache large enough to hold all of a given table's indexes completely, but keys from other tables still could contend for space in the cache.
MySQL 4.1 and up offers a solution to this problem because it supports setting up multiple key caches and allows a table's indexes to be assigned to and preloaded into a given cache. This can be useful if you have a table that sees especially heavy use and you have sufficient memory to load its indexes into the cache. This capability enables you to avoid both same-table and other-table contention: Create a cache that is large enough to hold a table's indexes completely and devote the cache exclusively to the use of that table. No disk I/O is necessary after the keys have been loaded into the cache. Also, key values will never need to be discarded from the cache and key lookups for the table can be done in memory.
The following example shows how to set up a key cache for the member table in the sampdb database, using a cache with a name of member_cache and a size of 1MB. You must have the SUPER privilege to carry out these instructions.
Set up a separate key cache large enough to hold the indexes from the table:
Assign the table to the key cache:
Preload the table's indexes into its key cache:
mysql> SET GLOBAL member_cache.key_buffer_size = 1024*1024;
mysql> CACHE INDEX member IN member_cache; +---------------+--------------------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------------+--------------------+----------+----------+ | sampdb.member | assign_to_keycache | status | OK | +---------------+--------------------+----------+----------+
mysql> LOAD INDEX INTO CACHE member; +---------------+--------------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------------+--------------+----------+----------+ | sampdb.member | preload_keys | status | OK | +---------------+--------------+----------+----------+
If you want to load other tables into the same cache or create other key caches for other tables, that can be done as well. For more information about key caches, consult Chapter 11.
Using the Query Cache
The MySQL server can use a query cache to speed up processing of SELECT statements that are executed repeatedly. The resulting performance improvement often is dramatic. The query cache works as follows:
The first time a given SELECT statement is executed, the server remembers the text of the query and the results that it returns.
The next time the server sees that statement, it doesn't bother to execute it again. Instead, the server pulls the result directly from the query cache and returns it to the client.
Query caching is based on the literal text of query strings as they are received by the server. Queries are considered the same if the text of the queries is exactly the same. Queries are considered different if they differ in lettercase or come from clients that are using different character sets or communication protocols. They also are considered different if they are otherwise identical but do not actually refer to the same tables (for example, if they refer to identically named tables in different databases).
When a table is updated, any cached queries that refer to it become invalid and are discarded. This prevents the server from returning out-of-date results.
Support for the query cache is built in by default. If you don't want to use the cache, and want to avoid incurring even the minimal overhead that it involves, you can build the server without it by running the configure script with the --without-query-cache option.
To determine whether a server supports the query cache, check the value of the have_query_cache system variable:
mysql> SHOW VARIABLES LIKE 'have_query_cache'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | have_query_cache | YES | +------------------+-------+
For servers that do include query cache support, cache operation is based on the values of three system variables:
query_cache_type determines the operating mode of the query cache. The following table shows the possible mode values:
query_cache_size determines the amount of memory to allocate for the cache, in bytes.
query_cache_limit sets the maximum result set size that will be cached; query results larger than this value are never cached.
Don't cache query results or retrieve cached results.
Cache queries except those that begin with SELECT SQL_NO_CACHE.
Cache on demand only those queries that begin with SELECT SQL_CACHE.
For example, to enable the query cache and allocate 16MB of memory for it, use the following settings in an option file:
[mysqld] query_cache_type=1 query_cache_size=16M
The amount of memory indicated by query_cache_size is allocated even if query_cache_type is zero. To avoid wasting memory, don't set the size greater than zero unless you plan to enable the cache. Also, a size of zero effectively disables the cache even if query_cache_type is non-zero.
Individual clients begin with query caching behavior in the state indicated by the server's default caching mode. A client can change the default caching mode for its queries by using this statement:
SET query_cache_type = val;
val can be 0, 1, or 2, which have the same meanings as when setting the query_cache_type variable at server startup. In a SET statement, the symbolic values OFF, ON, and DEMAND are synonyms for 0, 1, and 2.
A client also can control caching of individual queries by adding a modifier following the SELECT keyword. SELECT SQL_CACHE causes the query result to be cached if the cache mode is ON or DEMAND. SELECT SQL_NO_CACHE causes the result not to be cached.
Suppression of caching can be useful for queries that retrieve information from a constantly changing table. In that case, the cache is unlikely to be of much use. Suppose that you're logging Web server requests to a table in MySQL, and also that you periodically run a set of summary queries on the table. For a reasonably busy Web server, new rows will be inserted into the table frequently and thus any query results cached for the table become invalidated quickly. The implication is that although you might issue the summary queries repeatedly, it's unlikely that the query cache will be of any value for them. Under such circumstances, it makes sense to issue the queries using the SQL_NO_CACHE modifier to tell the server not to bother caching their results.
The earlier part of this chapter discusses techniques that help improve your server's performance regardless of your hardware configuration. You can of course get better hardware to make your server run faster. But not all hardware-related changes are equally valuable. When assessing what kinds of hardware improvements you might make, the most important principles are the same as those that apply to server parameter tuning: Put as much information in fast storage as possible, and keep it there as long as possible.
Several aspects of your hardware configuration can be modified to improve server performance:
Install more memory into your machine. This enables you to configure larger values for the server's cache and buffer sizes, which allows it to keep data in memory longer and with less need to fetch information from disk.
Reconfigure your system to remove all disk swap devices if you have enough RAM to do all swapping into a memory filesystem. Otherwise, some systems will continue to swap to disk even if you have sufficient RAM for swapping.
Add faster disks to improve I/O latency. Seek time is typically the primary determinant of performance here. It's slow to move the heads laterally; after the heads have been positioned, reading blocks off the track is fast by comparison. However, if you have a choice between adding more memory and getting faster disks, add more memory. Memory is always faster than your disks, and adding memory allows you to use larger caches, which reduces disk activity.
Take advantage of parallelism by redistributing disk activity across physical devices. If you can split reading or writing across multiple physical devices, it will be quicker than reading and writing everything from the same device. For example, if you store databases on one device and logs on another, writing to both devices at once it will be faster than if databases and logs share the same device. Note that using different partitions on the same physical device doesn't count as parallelism. That won't help because they'll still contend for the same physical resource (disk heads). The procedure for moving logs and databases is described in Chapter 10, "The MySQL Data Directory."
Before you relocate data to a different device, make sure that you understand your system's load characteristics. If there's some other major activity already taking place on a particular physical device, putting a database there may actually make performance worse. For example, you may not realize any overall benefit if you process a lot of Web traffic and move a database onto the device where your Web server document tree is located.
Use of RAID devices can give you some advantages of parallelism as well.
Use multi-processor hardware. For a multi-threaded application like the MySQL server, multi-processor hardware can execute multiple threads at the same time.