Standardizing UNIX

By David Chisnall

Date: Feb 2, 2007

Article is provided courtesy of Prentice Hall Professional.

Return to the article


The UNIX family has always been quite diverse. David Chisnall takes a look at the history of attempts to create standards for UNIX-like systems, and the relevance of those standards today.

Andrew Tanenbaum once said that the nice thing about standards is that there are so many to choose from. In the UNIX dark ages, there were no standards. Everyone who released a version of UNIX would support the base functionality from the AT&T or BSD code that they started with and then add a load of their own extensions on top.

Since one of the key marketing features for UNIX vendors was that it was easy to port code from one UNIX to another, the lack of standards was a problem. A group of European UNIX vendors decided to address the problem by developing the X/Open specification. Any code written to that specification could be run on any conforming implementation.

Unfortunately, this standardization effort happened at the same time as another effort conducted by the IEEE, which produced the IEEE 1003 specification. Because "IEEE 1003" isn’t very easy to remember, and doesn’t exactly roll off the tongue, the IEEE asked for suggestions for a more catchy name. Richard M. Stallman, better known for founding the GNU Project, proposed the name Portable Operating System Interface. Since POSI would have sounded silly, an X was added on the end to make the acronym POSIX. This change made the name sound more like UNIX, and made it a lot more marketable.

Windows NT and the POSIX Checkbox

The U.S. government has always been a fan of standards, and POSIX was no exception. Many government agencies specify POSIX as a requirement. This was a significant problem for one vendor, which had quite a large portion of the operating systems market. Microsoft DOS wasn’t POSIX-complaint, and since it also wasn’t UNIX-based, making DOS POSIX-compliant would have taken a lot of effort.

Microsoft’s new operating system, Windows NT, was designed with the ability to run different servers, providing different interfaces to the system. One of these interfaces was used to run OS/2 applications, one handled "native" Win32 applications, and one was provided for POSIX compatibility.

Unfortunately, the Windows NT POSIX layer was close to useless. The POSIX standard itself included a lot of ambiguity; many functions are allowed to return ENOTIMPLEMENTED and fail. A system may pass all of the POSIX compliance tests with only a significant fraction of the functions, and Microsoft’s implementation did exactly that. Windows NT developers were expected to use the Win32 API, and not POSIX. The POSIX subsystem was there to allow them to put a tick in the correct feature box, and not for anyone to actually use. Not only was the POSIX subsystem woefully incomplete—it was impossible to mix POSIX and Win32 code in the same application. Therefore, POSIX applications (if you could get them to run at all) were limited to the command line, since the only way of accessing the GUI functionality was the Win32 API.

Fortunately, the story doesn’t end there. In 1995, Steve Chamberlain, working at Cygnus Solutions, noticed that much of the code was already in place to allow the GNU Compiler Collection (GCC) to produce Windows executables. It already supported the COFF binary format used by 32-bit versions of Windows, as well as the x86 instruction set.

Once GCC was able to produce Windows binaries, it was logical to try to make it run on Windows. Unfortunately, this task was much more difficult. Building GCC requires a POSIX-like environment, which wasn’t available on Windows. A few Cygnus engineers began working to produce a POSIX compatibility layer that translated POSIX calls into native Windows calls—the opposite of Winelib.

In 1999, Cygnus Solutions merged with Red Hat, which continues to develop this translation layer, marketed as Cygwin and available under the GPL or a closed-source license. Over the years, a lot of Free Software has been compiled and packaged using Cygwin, including a complete X server, allowing many graphical UNIX applications to be run on Windows with no modification.

At the same time as Cygwin was being developed, Softway Systems created OpenNT, a replacement for the POSIX subsystem in Windows NT. Later the name was changed to Interix; and finally, after being purchased by Microsoft, was named Services for UNIX. As of 2004, server versions of Windows have a POSIX subsystem that supports threads, signals, sockets, and shared memory, less than twenty years after the standard was finalized, and only a little over a decade after Microsoft began advertising POSIX compliance.

A Tale of Two Standards

Back in the UNIX world, there were two competing standards, POSIX and X/Open. Between 1993 and 1996, X/Open was UNIX. The UNIX trademark was managed by X/Open, which defined the requirements and testing suites used to determine whether a system was allowed to be called UNIX. Then X/Open and the Open Software Foundation (OSF) merged to form The Open Group, which now controls the UNIX trademark.

The Open Software Foundation had attempted to create its own UNIX standard. OSF/1 was intended to be a standard implementation that could be extended by vendors, as competition to System V and 4.3BSD. OSF/1 lasted from 1990 until 1994, by which point only Digital was still selling an OSF/1 derivative.

The OSF that merged with X/Open to form The Open Group had already undergone one merger. In response to the creation of the OSF and the planned development of OSF/1, AT&T formed UNIX International. In 1994, these two organizations merged to counter the growing threat from Microsoft. The joint body was responsible for overseeing development of the Common Desktop Environment, which now incorporated the OSF’s Motif widget set and provided a standard GUI for UNIX systems.

Many of the members of The Open Group were involved in the development of yet another standard, the Common API Specification, which was quite popular. POSIX, being an IEEE standard, was not freely available. This fact proved a handicap for some developers; for example, Linus Torvalds didn’t have access to the POSIX specification when he began implementing Linux, and was forced to guess based on the behavior of existing systems and the descriptions in their man pages.

In 1998, the Austin Common Standards Revision Group began producing a new standard in an attempt to harmonize the existing ones. This group was run by The Open Group, which provides the group’s chairman and performs administrative functions, but includes many more members. The outcome of this process was the Single UNIX Specification (SUS), commonly known as UNIX98. The idea behind the Austin Group was that a single specification would be created and then adopted by ISO, the IEEE, and The Open Group. The same core material forms the basis of POSIX, SUSv3, and ISO/IEC 9945, although each has its own extensions.

These days, X/Open can be regarded as obsolete, leaving just POSIX and the Single UNIX Specification standing. Fortunately for implementers, these standards retain a large overlap.

What’s in a Standard?

The Single UNIX Specification and POSIX both specify a set of command-line utilities as well as C functions. These include things like the Bourne Shell, which must be installed as sh on any compliant system. On a GNU system, this is typically the GNU Bourne-Again Shell (bash), which runs in compatibility mode when invoked as sh. Other systems have their own implementations.

The basic utilities that these specifications require are quite varied. While you would expect any UNIX to support a basic shell and commands such as ls and cat, it’s less obvious that an M4 macro processor should also be a requirement.

Since one of the main goals of the efforts to standardize UNIX has been source compatibility, it should come as no surprise that the standard defines C interfaces to a set of standard system calls and library functions. Perhaps less obvious is the fact that the Single UNIX Specification requires a c99 compiler (called c99) and defines a set of command-line options that it must support.

Of course, the make utility is also part of POSIX and the SUS. Unfortunately, unlike C, makefile syntax isn’t well standardized. Each implementation of make supports basic rules of the following form:

target: dependencies
    action

For simple projects, this setup is enough. More complicated projects require things like conditional statement, however. These items aren’t defined by the standard, and different make implementations support them in different ways. A complex makefile that works with BSD make may not work on GNU or Solaris systems, for example, and utilities such as GNU Autoconf are often used to generate platform-specific makefiles.

Lots of cracks are apparent in the standard. For example, while the du command is standardized and has a set of standard options, none of these options specify how to state the maximum depth for du to print. On a GNU platform, you would use --max-depth; on Minix you’d use -l, on FreeBSD you’d use -d, and on most other platforms you would have to resort to something more complicated involving find.

It’s worth noting that, although the POSIX specification was based on UNIX, the first platform to pass the certification wasn’t a UNIX derivative. Digital’s VMS has that honor, and was rebranded as OpenVMS to indicate this compliance. Unlike Windows, the POSIX support in VMS actually was usable, although most VMS users look down on the POSIX environment as somewhat inferior to the native interfaces.

But There’s More!

The UNIX specifications continue to evolve. In 2003, a new version of the Single UNIX Specification was released, commonly known as UNIX03. The UNIX-year nomenclature comes from the fact that The Open Group, which publishes the SUS, is the current custodian of the UNIX trademark, and so The Open Group’s standards define which systems are allowed to be called UNIX. In 2003, The Open Group sued Apple for describing OS X as UNIX without applying for certification. The case was settled in 2004. The exact terms of the settlement weren’t published, although Apple still refers to OS X as being "based on UNIX." Whether the UNIX trademark is still enforceable remains unclear, although these days the value of the UNIX trademark is somewhat difficult to distinguish anyway, since the two most widely distributed *NIX systems (OS X and Linux) are not officially UNIX.

Extensions to the POSIX specifications include the real-time APIs, which attempt to correct some of the shortcomings of the original UNIX APIs. Support for asynchronous I/O is one example, but a more useful one is a reworking of the UNIX signals model.

Traditional UNIX signals have some problems as a general-purpose event model. They convey only one bit of information—that a signal with a particular number has been raised. While in a signal handler, if another signal with the same number is raised, the signal handler itself could be preempted. Alternatively, you could mask future occurrences of the signal while the handler was running, but then they would be lost. There was no good way of sequentially processing each signal event.

The real-time signals extensions improve matters considerably. They provide an enhanced signal handler, which takes an argument containing some extra data associated with the signal. This value is either an integer or a pointer, and can be used to identify the source of the signal.

The other improvement offered by real-time signals is that they can be enqueued. When a real-time signal arrives, the signal handler runs to completion. Afterward, the next signal is taken off the head of the queue and processed.

While most operating systems support the basic POSIX standards, support for the extensions is somewhat more hit-and-miss. Most UNIX-like systems support the old System V IPC mechanisms, but support for their POSIX replacements is less common. OS X, for example, supports the newer POSIX Semaphores and shared memory, but not message queues. Of course, new is a relative term. These APIs were originally standardized over a decade ago, in 1993.

UNIX standards tend to codify existing implementations, rather than proposing new ones. The standard way of communicating over a network on a modern UNIX system is to use Berkeley Sockets (now sometimes called POSIX Sockets), an API that originated on 4.4BSD. For many years, System V IPC was the standard way of communicating between local processes, although there are now POSIX APIs that supersede it. The POSIX IPC mechanisms are semantically very similar to their System V counterparts, however, and making a platform that supports one support the other isn’t a difficult undertaking.

Although certification is no longer essential for a UNIX-like system to be taken seriously, there is still a significant benefit to the existence of standards. It’s a lot easier to write code that conforms to the specification, and then apply platform-specific patches for non-compliant systems, than to navigate the maze of nonstandard APIs. Eventually the patches can usually be removed, since most systems still aim toward compliance even when they sometimes fail to meet it.