STEP 5. Monitoring and Tuning a Database
A well-tuned database system has the following characteristics:
The system is mostly CPU-bound.
Disk I/O is well-balanced.
The buffer cache is working effectively.
The database is configured to run efficiently.
As we have already noted, the applications must also be coded efficiently if the database system is to run efficiently.
Having investigated memory, disks, networks and CPUs, you need to undertake the final step: monitor and tune the database, a process explored in detail in the following chapters of Configuring and Tuning Databases on the Solaris Platform for Oracle, Sybase, Informix, and DB2. Monitoring and tuning the buffer cache is a vital element of database tuning. A brief review of the main issues is presented in the next section, but before proceeding to the database tuning chapters, you would do well to review Chapter 7, which is dedicated to the buffer cache. Finally, although application efficiency is a crucial component in any well-tuned system, we will not be considering it since it is beyond the scope of this document. Refer to Techniques for Optimizing Applications: High Performance Computing, by Rajat Garg and Ilya Sharapov, Sun Microsystems Press, 2001 for a discussion on application optimization.
The Buffer Cache
One of the main metrics in monitoring database performance for OLTP workloads is the buffer cache hit rate. The buffer cache stores in memory as many database blocks read from disk as possible. The expectation is that the same database blocks will often be used by different transactions; the buffer cache hit rate shows how often a requested block was retrieved from the cache rather than from the disk. Since memory access is so much faster than disk access, a high cache hit rate is important for good performance with OLTP workloads.
For DSS workloads the buffer cache is less important. Some databases (for example, Oracle and Informix) avoid the buffer cache entirely when carrying out table or index scans. DB2 for Solaris does use the buffer cache, but there may be little opportunity for reuse of buffers since the volume of data brought into the cache is typically much greater than the cache size. So, cache buffers tend not to stay in the buffer for long. Similar behavior applies to batch jobs processing large volumes of data.
A Closer Look at the Cache Hit Rate
The cache hit rate can be misleading. For example, how much better is a 95% cache hit rate than a 90% cache hit rate? Does it represent a 5% improvement?
The answer is no! A 90% cache hit rate means 90% of all read I/Os are satisfied from the cacheonly 10% of the reads result in a physical disk access. A 95% cache hit rate means only 5% of reads go to disk. So, improving the cache hit rate from 90% to 95% means physical disk reads are halved, not improved by only 5%! Depending on the number of physical reads, halving them may make a significant difference to the load on the disks.
Given this potential confusion, it is often more useful to think in terms of the miss rate (100 minus cache hit rate).
You can usually improve the cache hit rate by increasing the size of the database buffer cache, although a point is eventually reached when any improvement is not worth the extra memory. Cache sizing and effectiveness are discussed in more detail in Chapter 7 of Configuring and Tuning Databases on the Solaris Platform.
An Appropriate Cache Hit Rate
A cache hit rate of 80% might be quite suitable in one situation while a much higher hit rate of 95% might be inadequate in another. How can you tell when you need to increase the size of the buffer cache to try to improve the cache hit rate?
You should consider three factors when determining an appropriate cache hit rate:
The amount of memory available. Avoid paging at all costs. If increasing the size of the buffer cache causes application paging, then it is better to leave the cache as it is. If free memory is available, then increasing the cache is an option.
The load on the database disks. Determine the disk utilization of the database disks by looking at either iostat, sar, or statit statistics for the relevant disks. Check, too, how many of the I/Os are due to reads. If the database disks are showing high utilization or high service times (as defined in "STEP 2. Monitoring Disks" on page 9) and a significant number of the I/Os are reads, then increasing the cache hit rate could take the pressure off the disks.
Application response times. Retrieving disk blocks from the cache is faster than retrieving them from disk and also consumes less CPU. So improving cache hit rate might also reduce response times. Many factors contribute to response times, though, so any improvement may be less than you were hoping for.
Aftereffects of Database Monitoring and Tuning
Given that the database may be the major application on the system, investigating the behavior of the database should shed further light on the data already collected. Monitoring and tuning the database is an iterative process. Tuning the database may change the behavior of the system, in which case it will be important to briefly revisit the previous steps covered in the chapter.