Home > Articles > Networking > Storage

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

3.3 I/O Assessment and Analysis Tools

The best way to look at I/O behaviors and performance is to look at system tools on the hosts that run the applications. An examination using system tools provides a top-down view of the I/O subsystem from the host system's perspective. A higher level view of the I/O behaviors can sometimes be extracted from an application, such as a relational database management system (RDBMS), but not all applications have the ability to report this data. Further, there is no consistent way to extract the data for the applications that report I/O statistics.

Because of this inconsistency and because system tools tend to be more consistent in their availability and data set measurement, it is best to start with the system tools themselves. The system tools provide a distilled version of the application I/O behavior at the device level. Any additional application-level device abstractions are lost, but the raw I/O behaviors will still show through in the analysis.

It is possible to perform an analysis of the I/O system from the storage device point of view in a bottom-up fashion. This method does not have the problems of an application-level analysis because of the common availability of useful statistics on almost all intelligent storage devices. Information gathering takes place with device-specific methods because standards for the contents of the data set and the data extraction method are not quite complete.1 New storage device management standards will make data gathering from storage devices more complete and consistent, so that all devices can provide the same basic utilization and performance data. Implementation is in various stages depending on the hardware and software vendors, the products in use, and the chosen device management method.

In general, put off device analysis until the host system analysis is complete. The storage device analysis has greater depth and narrower scope, and it requires more effort to perform. Delaying this analysis enables a more focused approach on the storage devices, whose greater amount of storage-specific I/O data can easily swamp the investigator.

A few simple scripts written in Perl or a shell language can quickly examine UNIX hosts that have the sar utility. sar is a very useful tool to use, available on almost all UNIX operating system variants. The sar data set and output are quite consistent from UNIX to UNIX. The data available from the Windows NT perfmon command can also be processed fairly easily from its logged format.

A quick look at the sar man page on your UNIX host system will provide details on the timing and amount of data gathered. On most UNIX host systems, the data is the past week's-worth of system data. A simple spreadsheet analysis of the data can provide information on maximum system bandwidth and IOPS. The analysis can also show patterns of usage throughout a day or several days. Once the script is run on each host system, the collected data can be examined and combined with data collected from other host systems, if necessary, to provide a complete snapshot of the host system's workload.

The get_io.sh script in Example 3.1 performs two functions:

  1. It gathers bandwidth and IOPS data from a host system.

It outputs data files from sar input data for analysis in a spreadsheet.

The analysis of the data set gathered from the script is performed by putting the comma-separated-value output files of each data type (bandwidth or IOPS) for each day assessed into a spreadsheet. The data can then be graphed versus time in order to visualize the I/O behaviors of the host system under evaluation in the modes of bandwidth, IOPS, and I/O size. The visualization of the data reveals some significant I/O parameters for the SAN design, such as maximum bandwidth utilization, maximum IOPS utilization, workload windows, workload consistency, and characteristic I/O sizes. Additional mathematical analysis may be of use if the visualization of the data provides poor insight into the I/O behaviors of the analyzed host system, but usually this is not required.

The fairly simple script in Example 3.1 takes data collected by the sar utility and creates twenty-minute aggregated data points of bandwidth and IOPS from the host system perspective on all I/O channels combined. See Figure 3.3 (on page 61, top) for an example of the output of the get_io.sh script. The two sets of output files from the script can also be combined to find out the typical I/O size of the application being examined over these intervals.

EXAMPLE 3.1. The get_io.sh shell script

#!/bin/sh
# get_io.sh
# Gather aggregate bandwidth and IOPS data from a host's sar data files
# Gather bandwidth data from sar archives
day=1
for sarfile in ´ls /var/adm/sa/sa[0-2]*´
do
    shour=0
    ehour=0
    min=0
    while [ $shour -le 23 ]
    do
        ehour=´expr $shour + 1´
        interval=0
        # Divide each hour into 3 parts because the data is in 20-minute
        # intervals
        while [ $interval -le 2 ]
        do
            case "$interval" in
                0)
                blocks=0
                sum=0
                # Extract the data from a sar archive file and
                # sum the blks/s column
                for blocks in ´sar -d -f $sarfile -s $shour:00:00 -e
                $shour:20:30 | egrep -v "IRIX|sun4|HP-UX|AIX|,|^[0-2]"
                | awk '{print $5}'´
                do
                    sum=´expr $sum + $blocks´
                done
                # Clean up any old temp files, then compute bandwidth
                rm -f /usr/tmp/bcfile
                echo $sum " / 2 / 1024" >> /usr/tmp/bcfile
                echo quit >> /usr/tmp/bcfile
                bw=´bc -l /usr/tmp/bcfile´
                # Store the bandwidth result in a csv file
                echo $bw >> /usr/tmp/bw_$day.csv
                # Report the bandwidth result
                echo "Bandwidth is" $bw "MBps"
                ;;

                1)
                blocks=0
                sum=0
                for blocks in ´sar -d -f $sarfile -s $shour:20:00 -e
                $shour:40:30 | egrep -v "IRIX|sun4|HP-UX|AIX|,|^[0-2]"
                | awk '{print $5}'´
                do
                    sum=´expr $sum + $blocks´
                done
                rm -f /usr/tmp/bcfile
                echo $sum " / 2 / 1024" >> /usr/tmp/bcfile
                echo quit >> /usr/tmp/bcfile

                bw=´bc -l /usr/tmp/bcfile´
                echo $bw >> /usr/tmp/bw_$day.csv
                echo "Bandwidth is" $bw "MBps"
                ;;

                2)
                if [ $shour -eq 23 ]
                then
                    break
                fi
                blocks=0
                sum=0
                for blocks in ´sar -d -f $sarfile -s $shour:40:00 -e
                $ehour:00:30 | egrep -v "IRIX|sun4|HP-UX|AIX|,|^[0-2]"
                | awk '{print $5}'´
                do
                    sum=´expr $sum + $blocks´
                done
                rm -f /usr/tmp/bcfile
                echo $sum " / 2 / 1024" >> /usr/tmp/bcfile
                echo quit >> /usr/tmp/bcfile
                bw=´bc -l /usr/tmp/bcfile´
                echo $bw >> /usr/tmp/bw_$day.csv
                echo "Bandwidth is" $bw "MBps"
                ;;

            esac
        interval=´expr $interval + 1´
        done
        shour=´expr $shour + 1´
    done
    day=´expr $day + 1´
done

# Gather IOPS data from sar archives
day=1
rm -f /usr/tmp/bcfile
for sarfile in ´ls /var/adm/sa/sa[0-2]*´
do
    shour=0
    ehour=0
    min=0
    while [ $shour -le 23 ]
    do
        ehour=´expr $shour + 1´
        interval=0
        while [ $interval -le 2 ]
        do
            case "$interval" in
                0)
                ios=0
                sum=0
                # Extract the data from a sar archive file and
                # sum the r+w/s column
                for ios in ´sar -d -f $sarfile -s $shour:00:00 -e
                $shour:20:30 | egrep -v "IRIX|sun4|HP-UX|AIX|,|^[0-2]"
                | awk '{print $4}'´
                do
                    echo $ios "+ \\" >> /usr/tmp/bcfile
                done
            echo 0 >> /usr/tmp/bcfile
        echo quit >> /usr/tmp/bcfile
            # Compute the IOPS
        iops=´bc -l /usr/tmp/bcfile´
                # Store the result in a csv file
                echo $iops >> /usr/tmp/ios_$day.csv
                # Report the result
                echo "IOPS are" $iops
            # Clean up any old temp files
        rm -f /usr/tmp/bcfile
                ;;

                1)
                ios=0
                sum=0
                for ios in ´sar -d -f $sarfile -s $shour:20:00 -e
                $shour:40:30 | egrep -v "IRIX|sun4|HP-UX|AIX|,|^[0-2]"
            | awk '{print $4}'´
                do
                    echo $ios "+ \\" >> /usr/tmp/bcfile
                done
        echo 0 >> /usr/tmp/bcfile
        echo quit >> /usr/tmp/bcfile
        iops=´bc -l /usr/tmp/bcfile´
                echo $iops >> /usr/tmp/ios_$day.csv
                echo "IOPS are" $iops
        rm -f /usr/tmp/bcfile
                ;;

                2)
                if [ $shour -eq 23 ]
                then
                    break
                fi
                ios=0
                sum=0
                for ios in ´sar -d -f $sarfile -s $shour:40:00 -e
                $ehour:00:30 | egrep -v "IRIX|sun4|HP-UX|AIX|,|^[0-2]"
                | awk '{print $4}'´
                do
                    echo $ios "+ \\" >> /usr/tmp/bcfile
                done
        echo 0 >> /usr/tmp/bcfile
        echo quit >> /usr/tmp/bcfile
        iops=´bc -l /usr/tmp/bcfile´
                echo $iops >> /usr/tmp/ios_$day.csv
                echo "IOPS are" $iops
        rm -f /usr/tmp/bcfile
                ;;

esac
        interval=´expr $interval + 1´
        done
        shour=´expr $shour + 1´
    done
    day=´expr $day + 1´
done

The get_iosize.pl script in Example 3.2 takes pairs of bandwidth and IOPS output files from the script in Example 3.1 and uses the simple equation

I/O size = Bandwidth (KB/s) / IOPS 

to generate the typical I/O size over the same intervals.

The output of this script will add a bit more detail to the analysis of the application and host system. See Figure 3.3 (on page 61, bottom) for an example of the output from the get_iosize.pl script. The graphic analysis of the data shows patterns and anomalies. The more regular the patterns look in the graphical analysis in terms of IOPS, bandwidth, and I/O size, the more likely it is that the conclusions drawn from the patterns will be useful. Less consistent graphs indicate

EXAMPLE 3.2. The get_iosize.pl shell script

#!/usr/local/bin/perl
#
# get_iosize.pl
# Find the characteristic I/O size from the output of get_io.sh script
$i=1;
while ( $i <= 7 ) {
    # Open the result file for output from this script
    open (OUTFH, ">>/usr/tmp/iosize_$i") || die "Can't open file, $!\n";
    # Open and read the bandwidth and IOPS output csv file pair
    open (BWFH, "/usr/tmp/bw_$i") || die "Can't open file, $!\n";
    @bwinfo=<BWFH>;
    close (BWFH);
    open (IOPSFH, "/usr/tmp/ios_$i") || die "Can't open file, $!\n";
    @iopinfo=<IOPSFH>;
    close (IOPSFH);
    # Make sure the number of data collection intervals
    # in each file matches or quit
    if ( $#bwinfo != $#iopinfo) {
        printf "The files for day $i don't match. Exiting\n";
        exit;
    }
    $j=0;
    # Divide the bandwidth in KBytes by the number of IOPS
    # to get the I/O size
    while ( $j <= $#bwinfo) {
        if ( @iopinfo[$j] != 0) {
            $iosize = $bwinfo[$j] * 1024 / $iopinfo[$j];
            } else {
            $iosize = 0;
            }
        # Report the I/O size result and record it in an output file.
        printf "Typical IO size is $iosize\n";
        printf OUTFH "$iosize\n";
        $j++;
    }
    close (OUTFH);
    $i++;
}

more variable system usage, making the sizing task more difficult. Pattern uncertainties can lead to overconfiguration and waste of resources in the SAN design.

  • + Share This
  • 🔖 Save To Your Account