The primary conclusion we make from our experiment is that IOPS headroom is critical. The IOPS capability of a system varies based on many factors, including number of I/O channels, number of attached disk devices (or LUNS), RAID configuration, presence or absence of an I/O cache, memory caching capabilities, and so on. Of course, the I/O requirements are workload specific, so it's impossible to establish a general set of rules or guidelines about I/O rates that help recovery tuning. Also, the nature of the workload is a huge factor. A workload that is write intensive (inserts, updates, etc.) has dramatically different I/O requirements and tuning dynamics than a read-intensive workload. It is important to understand the workload whenever a tuning effort is undertaken, whether it's optimal performance or recovery tuning.
Our findings are summarized as follows:
The impact of tuning for recovery is substantially minimized in later releases of ORACLE, specifically ORACLE 9.2.
In the ORACLE 8 tests, we used hard ("brute-force") checkpointing, and were able to reduce recovery significantly, from about 27 minutes to about 4 minutes, but that came at a application performance regression of 28%.
Some quick tests done with incremental checkpointing in ORACLE 8.1.7 yielded better recovery times, down to 72 seconds, but caused a 26% application performance reduction.
We purposely used a system that was I/O-constrained, which predictably resulted in a negative performance impact.
Initial tests with ORACLE 9 on the smaller system (system A) demonstrated excellent recovery times (under 2 minutes) with a minimal performance impact (less than 5%). We addressed the I/O configuration that constrained the ORACLE 8 tests through the use of more disks.
The larger system (System B) with ORACLE 9 demonstrated effective use of fast_start_mttr_target with parallel_execution_message_size and recovery_parallelism. Without tuning fast start, going from serial recovery (no parallelism) to 24 recovery processes and increasing the message buffer size from the default of 2 kilobytes to 4 kilobytes yielded a 40% improvement in recovery time (from about 44 minutes to 26 minutes). Further gains were measured by increasing the message buffer size to 16 kilobytes, where recovery dropped to 22 minutes.
Using fast_start_mttr_target with message buffer tuning and multiple recovery processes resulted in achieving recovery times within the MTTR setting.
The experiments we performed employed an artificial OLTP workload on a small sample of system and database sizes. The OLTP workload used is much more I/O intensive than typical transactional workloads. Thus, using this workload for the purpose of these measurements represents something close to a worst-case scenario. It is useful for illustrating the recovery capabilities of Solaris OS and ORACLE, and deriving data points on tuning and configuration variances. As is always the case, your results may vary.
Our recommendations are as follows, organized by component.
Pay attention to POST diagnostic settings. Set them to minimum values, unless installing new hardware or fault isolation is in progress. Make sure auto-reboot is set in the appropriate firmware variable.
Always use logging file systems. Even if the ORACLE data files exist on raw devices, enable logging for root and other file systems required for system operation. Define a separate and distinct raw partition dedicated to kernel core dumps, using the dumpadm(1M) command.
Define a sufficient number of state database locations, and have them spread evenly across available I/O channels and disks. Also, using fewer, larger volumes can help, because the number of objects has an effect on recovery time. The reality is that there is not much that can be done to tune this area for fast recovery restart.
If possible, upgrade to at least ORACLE 8.1.7 to take advantage of incremental checkpointing and fast_start_io_target.
Tune recovery_parallelism to at least the number of available processors on the system.
Increase parallel_execution_message_size to at least 8 kilobytes. Up to 16 kilobytes or 32 kilobytes would be better.
Enable MTTR statistic logging to the alert log. Also pay attention to dbwriter (dbwr) tuning. The dbwr processes write data from modified db_block_buffers to the underlying data files on disk, and thus dbwr performance impacts soft checkpointing. Configure a sufficient number of dbwr processes so checkpoint rates can be sustained, depending of course on the I/O subsystem configuration.
It's all about IOPS headroom. Pay close attention to disk utilization (iostat(1M)), and watch for hot spindles and I/O bottlenecks. Under-configured IOPS headroom impacts the performance of recovery.