InformIT

POSIX Asynchronous I/O

Date: Sep 22, 2006

Return to the article

Used judiciously, asynchronous I/O (AIO) can provide a significant speed benefit, says David Chisnall. Perhaps enough to help your program overcome the fact that modern processors can really zoom, while hard drives still drag.

Since I first learned to program, a lot has changed in the way computers work. While this fact isn’t always apparent to the user, it means that optimizations I was taught now result in highly suboptimal code. One thing, however, has remained constant: Hard drives are slow.

In recent years, this situation has gotten worse rather than better. CPU speeds have grown faster exponentially, while hard drives have barely managed even a linear speed increase. DMA transfers, becoming common, have widened the gap even more; now CPUs don’t have to do nearly as much work when handling disk I/O.

With the traditional UNIX I/O mechanisms, you issue a read or write system call, and then wait until the call has completed. A typical hard drive takes about 9 ms to position the head and then a little longer to perform the operation—and that’s assuming a linear. A modern CPU runs at least 1 GHz. In the time it takes to position the disk head, a 1 GHz CPU runs for nine million clock cycles. You can get a lot done in that many cycles.

The obvious solution is to dispatch the read or write requests as soon as you know what data you’re going to need, and then do something else while the OS processes the request. The asynchronous I/O (AIO) APIs, provided as part of the POSIX real-time extensions, allow you to do exactly that.

Operating System Support

Commercial UNIX systems have supported POSIX asynchronous I/O. Some of the best documentation currently available comes from IRIX and HP/UX. More recently, Free UNIX variants have started to support it; Linux and FreeBSD both support most of the specification, for example.

Support is slightly different between operating systems. Code using AIO on some platforms, including Linux and Solaris, must be linked with -lrt to provide support for the POSIX real-time extensions. On FreeBSD, AIO support isn’t built into the default kernel and must be loaded as a module with the following command:

# kldload aio

On versions of Mac OS X prior to 10.3, AIO wasn’t supported at all, and until OS X 10.4 AIO operations were supported only on files, not other file descriptor types such as sockets. Using AIO can give significant performance benefits, but it can also increase the testing overhead when porting to new platforms.

The Basic API

The two core functions of the AIO API are aio_read(2) and aio_write(2). These are analogous to the standard read(2) and write(2) functions. Of the two, aio_read(2) is more interesting. In a modern operating system, most writes will copy the data into a buffer and then write it to the physical disk later. This means that write(2) calls are partially asynchronous anyway.

An AIO operation happens in two parts. The first sends a request to perform the operation. The second waits for the operation to complete. The aio_suspend(2) function is used to wait for an I/O operation to complete. Once this returns, aio_return(2) is used to retrieve the return value.

The AIO Control Block

The basic data structure used by all AIO operations is the aiocb structure, which contains the metadata about the operation. The fields in the following table are the most important.

Field

Contents

aio_nbytes

The number of bytes to be read or written by this operation.

aio_filedes

The file descriptor of the file to access.

aio_buf

A pointer to the data to be written, or the location to store data to be read.

aio_offset

The offset from the start of the file.

A few other attributes (some of which will be discussed later) are used occasionally, but these four are the most important. Using these fields, we can write a simple program, aio1.c, to read data from a file:

#include <aio.h>
#include <stdio.h>


int main(int argc, char * argv[])
{
	if(argc != 2)
	{
		printf("Usage: %s {filename}\n", argv[0]);
		return -1;
	}
	struct aiocb cb;
	struct aiocb * cbs[1];
	//Open the file specified on the command line
	FILE * file = fopen(argv[1], "r+");
	
	//Set up the control block
	cb.aio_buf = malloc(11);
	cb.aio_fildes = fileno(file);
	cb.aio_nbytes = 10;
	cb.aio_offset = 0;

	//Perform the read
	aio_read(&cb);
	//Wait for it to complete
	cbs[0] = &cb;
	aio_suspend(cbs, 1, NULL);
	printf("AIO operation returned %d\n", aio_return(&cb));
	
	return 0;
}

This example isn’t particularly useful; it does nothing that couldn’t be done with a synchronous read. It does, however, illustrate the principle. We can compile and run the program as follows (don’t forget to add -lrt to the gcc command line if you’re on Linux):

$ gcc aio1.c
$ echo 12345678901234567890 > bar
$ ./a.out bar
10

The three AIO calls used here dispatch the request, wait for it to complete, and then find its return value. The aio_return(2) function will return exactly the same value that read(2) would have done if you had performed a synchronous read—that is, the number of bytes read, or -1, indicating an error. Note that aio_suspend(2) takes an array of aiocbs, and will return when any of them has completed or when a signal has been received. If you want to check whether an AIO operation has really completed, you can use aio_error(2), which returns EINPROGRESS if the operation hasn’t yet finished.

FreeBSD offers an alternate method of waiting for an AIO operation to complete: aio_waitcomplete(2) combines the functionality of aio_suspend(2) and aio_return(2). It returns both the aiocb corresponding to the operation to complete and the return value. While this is often a nice feature, its use is discouraged because it results in non-portable code.

Asynchronous Completion

In the previous example, the I/O operation isn’t completely asynchronous. While it does happen in the background, the application explicitly has to wait for it to finish. In many cases, this is undesirable. In these cases, you can use signal notifications to finish the AIO operations.

The standard way of sending asynchronous notifications to a process in UNIX is via the signals mechanism. Traditional UNIX signals were very primitive; they told you that an event had occurred, but nothing else. POSIX extends this capability to allow the delivery of an integer or a pointer to the signal handler.

To arrange for a signal to be delivered, we use the aio_sigevent member of the aiocb structure. This member tells the system the kind of notifications that should be used when an AIO operation completes. The following table lists the members.

Field

Contents

sigev_notify

The mechanism used to send the notification. For single-threaded programs, this should be SIGEV_SIGNAL.

sigev_signo

The number of the signal to send.

sigev_value

A union of an integer and a pointer, which is delivered to the signal handler.

The first thing we need to do is decide on a signal to use for asynchronous I/O notifications. The POSIX Realtime Signals Extension defines a range of signals between SIGRTMIN and SIGRTMAX that support some additional semantics; if signals in this range are delivered while a signal handler is running, they’ll be enqueued.

When installing the signal handler, using sigaction, we use the SA_SIGINFO flag to tell the system that we want to use the new-style signal handler. This technique delivers the sigev_value field, specified in our AIO request, to our signal handler, which allows us to know exactly which operation has completed.

In the following example, aio2.c, we’ll initiate two AIO operations, as we did previously. This time, the AIO control blocks will be stored in a global array, and the array index of the control block will be sent to the signal.

#include <aio.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>

struct aiocb * cb[2];

//The signal number to use.
#define SIG_AIO SIGRTMIN+5

//Signal handler called when an AIO operation finishes
void aio_handler(int signal, siginfo_t *info, void*uap)
{
	int cbNumber = info->si_value.sival_int;
	printf("AIO operation %d completed returning %d\n", 
		cbNumber,
		aio_return(cb[cbNumber]));
}

int main(void)
{
	struct sigaction action;
	//Create a buffer to store the read data
	char * foo = calloc(1,20);
	//Set up the signal handler
	action.sa_sigaction = aio_handler;
	action.sa_flags = SA_SIGINFO;
	sigemptyset(&action.sa_mask);
	sigaction(SIG_AIO, &action, NULL);
	FILE * file = fopen("bar", "r+");
	
	//Allocate space for the aio control blocks
	cb[0] = calloc(1,sizeof(struct aiocb));
	cb[1] = calloc(1,sizeof(struct aiocb));
	//Somewhere to store the result
	cb[0]->aio_buf = foo;
	cb[1]->aio_buf = foo + 10;
	//The file to read from
	cb[0]->aio_fildes = fileno(file);
	cb[1]->aio_fildes = fileno(file);
	//The number of bytes to read, and the offset
	cb[0]->aio_nbytes = 10;
	cb[1]->aio_nbytes = 10;
	cb[0]->aio_offset = 0;
	cb[1]->aio_offset = 10;
	//The signal to send, and the value of the signal
	cb[0]->aio_sigevent.sigev_notify = SIGEV_SIGNAL;
	cb[0]->aio_sigevent.sigev_signo = SIG_AIO;
	cb[0]->aio_sigevent.sigev_value.sival_int = 0;
	cb[1]->aio_sigevent.sigev_notify = SIGEV_SIGNAL;
	cb[1]->aio_sigevent.sigev_signo = SIG_AIO;
	cb[1]->aio_sigevent.sigev_value.sival_int = 1;

	aio_read(cb[0]);
	aio_read(cb[1]);
	while(1){sleep(1);}
	sleep(1);
}

When we run this code, we should get output like the following:

$ gcc aio2.c -lrt
$ ./a.out
AIO operation 0 completed returning 10
AIO operation 1 completed returning 10

The program will end in an infinite loop, so be sure to terminate it with Ctrl+C.

This mechanism can be used to have a pool of AIO control blocks which can be used when needed and then returned to the pool when the associated operation completes.

List I/O

System calls are expensive. Every time you cross the boundary between kernel and userspace, you have to save the state of the userspace stack, switch CPU modes, load the kernel state, and then do the same thing in reverse on the way up. You can save some of this overhead when performing AIO operations by using the lio_listio(2) call, which allows you to batch up to AIO_LISTIO_MAX operations in a single system call. As an added bonus, this approach makes it easier for the underlying operating system to re-order the requests for faster disk accesses.

Unlike the aio_* functions, lio_listio isn’t used exclusively for asynchronous I/O; it can also be used to dispatch a number of read and write requests to the kernel at the same time, like a more generalized form of readv(2)/writev(2). Since a number of AIO operations can be started at the same time, it’s important to specify which operation is required for each control block. This is done by setting the aio_lio_opcode filed to LIO_READ, LIO_WRITE, or LIO_NOP—for reading, writing, or ignoring, respectively.

The following example,aio3.c, shows how to use the synchronous form of lio_listio(2):

#include <aio.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>

struct aiocb * cb[2];

int main(void)
{
	struct sigaction action;
	//Create a buffer to store the read data
	char * foo = calloc(1,20);
	FILE * file = fopen("bar", "r+");
	
	//Allocate space for the aio control blocks
	cb[0] = calloc(1,sizeof(struct aiocb));
	cb[1] = calloc(1,sizeof(struct aiocb));
	//Somewhere to store the result
	cb[0]->aio_buf = foo;
	cb[1]->aio_buf = foo + 10;
	//The file to read from
	cb[0]->aio_fildes = fileno(file);
	cb[1]->aio_fildes = fileno(file);
	//The number of bytes to read, and the offset
	cb[0]->aio_nbytes = 10;
	cb[1]->aio_nbytes = 10;
	cb[0]->aio_offset = 0;
	cb[1]->aio_offset = 10;
	//Specify that these are read operations
	cb[0]->aio_lio_opcode = LIO_READ;
	cb[1]->aio_lio_opcode = LIO_READ;
	lio_listio(LIO_WAIT, cb, 2, NULL);
}

The first argument of the function instructs it whether to operate synchronously or asynchronously. If we set it to LIO_NOWAIT, then the call will return immediately, and the control blocks will act as if submitted to aio_read(2) or aio_write(2). If required, a single signal can be sent on completion of all of the code. This is specified by passing a struct sigevent as the last argument to lio_listio(2). This is created in exactly the same way as for an aio_sigevent field as described earlier.

Limitations of AIO

Asynchronous I/O operations take up some space in the kernel while they complete. As such, they tend to be rationed. On OS X, for example, you’re limited to 90 AIO operations in total at any one time, and only 16 per process. Unfortunately, the specification doesn’t define a standard way of finding these limits. Since AIO_LISTIO_MAX defines the maximum number of AIO operations that can be dispatched with a single system call, a good rule is to take this as the maximum number of AIO operations per process; with this precept, you’re guaranteed to choose a limit less than or equal to the real limit.

In some cases, it’s possible to increase the limits; for example, by changing sysctl values. Be careful doing this, however; often, those values are set low for a reason, and increasing them too much can make your kernel perform unexpectedly.

AIO support is still a little rough around the edges in a number of operating systems, and not supported in others. Any program or library that uses it should also provide a synchronous code path for platforms where a full AIO implementation is not available.

AIO is not suitable for all situations, but when used judiciously it can provide a significant speed benefit.

800 East 96th Street, Indianapolis, Indiana 46240