Home > Articles > Programming

  • Print
  • + Share This
Like this article? We recommend

I/O Infrastructure Performance Improvement Methodology

This section describes a methodology for performance improvement when there is a serious I/O bottleneck. Effective utilization of the I/O infrastructure capacity is the key to environment performance improvement. In the "Resolving CPU and I/O Bottlenecks Through Modeling and Capacity Planning" section, a capacity planning model was used to verify the positive impact on performance of a balanced I/O load distribution. In the simulation, new controllers and disks were added to the model and I/O load distribution was forced across the new devices. The simulations confirmed the benefits associated with a better I/O distribution (though not to be attributed to the addition of the new hardware).

Large database servers are a dynamic, evolving environment. You must be aware that most of the tuning efforts for optimization of I/O performance may became obsolete when the application I/O infrastructure utilization pattern changes. Therefore, performance tuning of such environments should be considered a continuous process rather than a specific one-time, effort and action plan. The suggested action plan is presented as a process for I/O optimization, to be used every time there is a change in the application environment that impacts its I/O utilization pattern.

Infrastructure Optimization Plan

TABLE 1 contains the details of the infrastructure optimization process..

TABLE 1 Infrastructure Optimization Process Details

Action

Expected Result

Mitigation Effort

Effort

1. Evaluate and implement all possible database server optimization for improved I/O distribution.

See Oracle and Application Optimization Suggestions

See Oracle and Application Optimization Suggestions

See Oracle and Application Optimization Suggestions

2. Identify the high utilization devices (hot spots) on the I/O infrastructure. Using standard Solaris OE tools or TeamQuest Viewer, identify those devices, (LUNs) with a utilization greater than 80%.

Medium to high impact.

Five to 15% performance improvement can be expected in association with each device for which excessive utilization can be resolved. There should be four-to-six devices currently eligible for optimization in the environment.

Low risk.

Application support and database administration staff is very familiar with operations involving relocation of logical volumes relocation.

Low for evaluation and medium for implementation.

Two-to-four hours for device identification and mapping of database structures. Implementation time depends on ability to execute operation in a test and build environment or during maintenance windows on the production environment.

3. Identify the logical volumes associated with the high-utilization devices.

 

 

 

4. Identify the database structures associated with those logical volumes.

 

 

 

5. Verify that the contents of each high-utilization logical volume can be distributed across less utilized and/or spare logical volumes. Execute I/O distribution and verify new utilization numbers for the affected device1.

Depends on the number of high utilization devices that can be optimized through relocation of the database files.

 

 

6. If utilization rates for all devices are under 80%, document new volume configuration to be reproduced in the next database build and re-run2 the TeamQuest capacity planning model, observing the model stretch factor and load growth projections for the new configuration.

 

 

 

7. If devices utilization is still high, verify if those devices are sharing the same controller with other high utilization devices.

Low impact.

Zero to 10% performance improvement, depending on impact of controller distribution on the I/O devices queue.

Low risk.

Same as the above (execution involves the same identification and logical volume relocation process).

Low effort for

evaluation and medium for implementation.

8. Distribute the high utilization devices evenly across the available controllers by relocating logical volumes contents, for example.

Controllers are not perceived today as bottlenecks on I/O performance. Note, however, that this step is recommended because the controllers may became a limiting resource as data starts moving faster, due to improved I/O distribution.

 

Same as above (execution involves the same identification and logical volume relocation process).

9. If utilization rates for all devices are under 80%, document new volume contents mapping to controllers, to be reproduced in the next database build.

Re-run the TeamQuest capacity planning model, observing model the stretch factor and load growth projections for the new configuration.

 

 

 

10. If neither logical volume contents can be relocated for better I/O distribution or the controller where the high-utilization device is located is overloaded, consider stripping the logical volume across more than one LUN.

Medium impact.

For each stripped logical volume replacing a current highly utilized volume/device, 5 to 15% performance improvement can be expected.

Medium risk.

Logical volumes and file systems will be removed and the re-created logical volumes striped.

Operation can possibly be executed on the building environment and then on a database about to promoted to production to minimize impact on the production environment.

Medium effort.

About 4-to-8 hours per logical volume for research on stripping configuration and deployment.

11. Evaluate Operations impact on volume stripping. Operations permitting, create "high performance" volumes by stripping a logical volume across two LUNs. Two non-stripped logical volumes can be converted in two stripped logical volumes on the respective two LUNs. Addition of new devices to the environment may facilitate execution and make it viable.

 

 

 

12. Relocate the identified high utilization devices/database structures to the newly created stripped volumes. Evaluate utilization of new devices.

 

 

 

13. If utilization rates for all devices are under 80%, document new volume organization to be reproduced on the next database build. Re-run the TeamQuest capacity planning model, observing the model stretch factor and load growth projections for the new configuration.

 

 

 

14. Depending on the impact of the stripping on performance, consider further logical volume stripping over a higher number of LUNs.

 

 

 

15. If a given controller remains with more than three high-utilization devices after the operations listed above, consider adding a new HBA to the I/O infrastructure for better I/O distribution.

Low impact.

Zero to 10% performance improvement, depending on impact of controller distribution on the I/O devices queue.

Low risk.

Support staff is very familiar with HBA-related configurations.

Medium effort.

About 4 hours.

Hardware upgrade on the production to be executed during a maintenance window.

16. Verify devices utilization after the new controller is added. If improvement is verified, re-run the TeamQuest capacity planning model, observing the model stretch factor and load growth projections for the new configuration.

 

 

 


Oracle and Application Optimization Suggestions

Oracle-related optimization is beyond the scope of this analysis. However, TABLE 2 lists some ideas for performance enhancement, based on the I/O distribution improvement principle.

TABLE 2 performance enhancement Ideas

1. Implement the planned Oracle database segmentation changes for improved I/O distribution.

High performance improvement.

A well balanced I/O infrastructure utilization has the potential to improve performance two to three times according to simulations on the capacity planning model.

Low risk

Low risk.

2. Explore the possibility of implementing load balance to the I/O infrastructure at the application level. If there are a few queries that are hot spots from the database standpoint, use a specific index file, for example. Potentially that index file can be duplicated on a read-only database and I/O load balancing implemented at the application/query level.

Viability to be determined by the application development and database management teams.

High performance improvement. Due to the same reasons listed above. Even higher potential for performance improvement, actually now under application direct control.

Medium risk, due to application code change.

High, due to application code change.

3. Review application architecture

High performance improvement

Low

High, due to application re-architecting


  • + Share This
  • 🔖 Save To Your Account