The Rise and Fall of the Fixed Function Pipeline
Once 2D was fast, people started thinking more about 3D acceleration. In particular, SGI produced a line of 3D workstations that were commonly shown off in films in the 1990s, most famously in Jurassic Park.
On most UNIX systems, you interact with 3D hardware via OpenGL, and you get an OpenGL context via X11. This means that 3D drivers are closely tied to the X11 implementation. OpenGL was based on SGI's IrixGL and was designed for close integration with X11. As with X11, it provides network transparency, allowing OpenGL commands to be streamed over a network.
OpenGL originally provided a fixed pipeline model. First you set up polygons, then you applied textures to them, then lighting, and finally you presented them on screen. Graphics accelerators implemented some parts of this pipeline, leaving the rest to the CPU.
The first part to be accelerated was the texturing. This is a very memory bandwidth-intensive operation, because every single polygon in a scene needs to have a bitmap scaled and then drawn on top of it. Later hardware accelerated the entire pipeline.
The first implementation of OpenGL for XFree86 was a pure-software implementation called Mesa. This provided the ability to run OpenGL applications, but because it did not use any hardware acceleration, it was quite slow. After a little while, it gained the ability to use 3dfx VooDoo series graphics cards to accelerate full-screen rendering.
Going via the X server for rendering 3D was problematic from a performance perspective, because it meant that the client application had to copy the data to the X server, then the server had to send it to the graphics card. This redundant copy could be very slow for things like large textures, and so the Direct Rendering Infrastructure (DRI) was proposed.
When an application uses DRI, it generates commands for the graphics hardware directly. These commands are then validated by the kernel and transferred to the hardware, without going via the X server. All the X server needs to do is set up the clipping regions on the screen.
By this stage, a lot of people were starting to wonder if the new capabilities of the GPU could be used for normal desktops. The traditional design of something like X11 assumes that RAM is very expensive. Every time you rearrange the windows, it tells the applications to redraw the newly-exposed bits, for example. If RAM, in particular video RAM, is cheaper, then you can keep a copy of each windoweven the obscured bitsin RAM and just copy the data from the cached version.
Keeping copies of all windows in RAM has a number of other advantages. When you composite them all together to produce the final image, you can perform arbitrary transforms or add transparency.
Somewhat surprisingly, one of the most processor-intensive tasks a modern windowing system does is render text. Each character on the screen is stored in a font file as a set of Bezier curves. It then needs to be turned into a raster image and drawnusually with antialiasingon the window. A modern X server accelerates this via the XRender extension. When an application uses this, it renders antialiased glyphs to small images with an alpha channel stored on the server. These can then be composited using the hardware.
As you might imagine, this involves a fair amount of code duplication. Both the OpenGL driver and the driver used to accelerate XRender are largely independent. Although they provide similar functionality, they share little code.