Home > Articles > Operating Systems, Server > Solaris

  • Print
  • + Share This
This chapter is from the book

5.4 Causes of Read/Write File System Latency

There are several causes of latency in the file system read/write data path. The simplest is that of latency incurred by waiting for physical I/O at the backend of the file system. File systems, however, rarely simply pass logical requests straight through to the backend, so latency can be incurred in several other ways. For example, one logical I/O event can be fractured into two physical I/O events, resulting in the latency penalty of two disk operations. Figure 5.3 shows the layers that could contribute latency.

05fig03.gif

Figure 5.3 Layers for Observing File System I/O

Common sources of latency in the file system stack include:

  • Disk I/O wait (or network/filer latency for NFS)
  • Block or metadata cache misses
  • I/O breakup (logical I/Os being fractured into multiple physical I/Os)
  • Locking in the file system
  • Metadata updates

5.4.1 Disk I/O Wait

Disk I/O wait is the most commonly assumed type of latency problem. If the underlying storage is in the synchronous path of a file system operation, then it affects file-system-level latency. For each logical operation, there could be zero (a hit in a the block cache), one, or even multiple physical operations.

This iowait.d script uses the file name and device arguments in the I/O provider to show us the total latency accumulation for physical I/O operations and the breakdown for each file that initiated the I/O. See Chapter 4 for further information on the I/O provider and Section 10.6.1 for information on its arguments.

# ./iowait.d 639
^C
Time breakdown (milliseconds):
 <on cpu>                                                       2478
 <I/O wait>                                                     6326

I/O wait breakdown (milliseconds):
 file1                                                           236
 file2                                                           241
 file4                                                           244
 file3                                                           264
 file5                                                           277
 file7                                                           330
 ...

5.4.2 Block or Metadata Cache Misses

Have you ever heard the saying "the best I/O is the one you avoid"? Basically, the file system tries to cache as much as possible in RAM, to avoid going to disk for repetitive accesses. As discussed in Section 5.6, there are multiple caches in the file system—the most obvious is the data block cache, and others include meta-data, inode, and file name caches.

5.4.3 I/O Breakup

I/O breakup occurs when logical I/Os are fractured into multiple physical I/Os. A common file-system-level issue arises when multiple physical I/Os result from a single logical I/O, thereby compounding latency.

Output from running the following DTrace script shows VOP level and physical I/Os for a file system. In this example, we show the output from a single read(). Note the many page-sized 8-Kbyte I/Os for the single 1-Mbyte POSIX-level read(). In this example, we can see that a single 1-MByte read is broken into several 4-Kbyte, 8-Kbyte, and 56-Kbyte physical I/Os. This is likely due to the file system maximum cluster size (maxcontig).

   # ./fsrw.d
Event           Device RW     Size Offset Path
sc-read              .  R  1048576      0 /var/sadm/install/contents
 fop_read            .  R  1048576      0 /var/sadm/install/contents
   disk_ra       cmdk0  R     4096     72 /var/sadm/install/contents
   disk_ra       cmdk0  R     8192     96 <none>
   disk_ra       cmdk0  R    57344     96 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344    152 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344    208 /var/sadm/install/contents
   disk_ra       cmdk0  R    49152    264 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344    312 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344    368 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344    424 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344    480 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344    536 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344    592 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344    648 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344    704 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344    760 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344    816 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344    872 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344    928 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344    984 /var/sadm/install/contents
   disk_ra       cmdk0  R    57344   1040 /var/sadm/install/contents

5.4.4 Locking in the File System

File systems use locks to serialize access within a file (we call these explicit locks) or within critical internal file system structures (implicit locks).

Explicit locks are often used to implement POSIX-level read/write ordering within a file. POSIX requires that writes must be committed to a file in the order in which they are written and that reads must be consistent with the data within the order of any writes. As a simple and cheap solution, many files systems simply implement a per-file reader-writer lock to provide this level of synchronization. Unfortunately, this solution has the unwanted side effect of serializing all accesses within a file, even if they are to non-overlapping regions. The reader-writer lock typically becomes a significant performance overhead when the writes are synchronous (issued with O_DSYNC or O_SYNC) since the writer-lock is held for the entire duration of the physical I/O (typically, in the order of 10 or more milliseconds), blocking all other reads and writes to the same file.

The POSIX lock is the most significant file system performance issue for databases because they typically use a few large files with hundreds of threads accessing them. If the POSIX lock is in effect, then I/O is serialized, effectively limiting the I/O throughput to that of a single disk. For example, if we assume a file system with 10 disks backing it and a database attempting to write, each I/O will lock a file for 10 ms; the maximum I/O rate is around 100 I/Os per second, even though there are 10 disks capable of 1000 I/Os per second (each disk is capable of 100 I/Os per second).

Most file systems using the standard file system page cache (see Section 14.7 in Solaris Internals) have this limitation. UFS when used with Direct I/O (see Section 5.6.2) relaxes the per-file reader-writer lock and can be used as a high-performance, uncached file system, suitable for applications such as databases that do their own caching.

5.4.5 Metadata Updates

File system metadata updates are a significant source of latency because many implementations synchronously update the on-disk structures to maintain integrity of the on-disk structures. There are logical metadata updates (file creates, deletes, etc.) and physical metadata updates (updating a block map, for example).

Many file systems perform several synchronous I/Os per metadata update, which limits metadata performance. Operations such as creating, renaming, and deleting files often exhibit higher latency than reads or writes as a result. Another area affected by metadata updates is file-extends, which can require a physical metadata update.

  • + Share This
  • 🔖 Save To Your Account