The CUDA Handbook: Texturing
This excerpt is from the Rough Cuts version of the book and may not represent the final version of this material.
10.1 Overview
In CUDA, a software technology for general-purpose parallel computing, texture support certainly could not have been justified if the hardware hadn’t already been there, due to its graphics-accelerating heritage. Nevertheless, the texturing hardware accelerates enough useful operations that NVIDIA saw fit to include support. Although many CUDA applications may be built without ever using texture, some rely on it to be competitive with CPU-based code.
Texture mapping was invented to enable richer, more realistic-looking objects by enabling images to be “painted” onto geometry. Historically, the hardware interpolated texture coordinates along with the X, Y and Z coordinates needed to render a triangle, and for each output pixel, the texture value was fetched (optionally with bilinear interpolation), processed by blending with interpolated shading factors, and written in place of the output pixel. With the introduction of programmable graphics and texture-like data that might not include color data (for example, bump maps), graphics hardware became more sophisticated: the shader programs included TEX instructions that specified the coordinates to fetch and the results were incorporated into the computations used to generate the output pixel. The hardware improves performance using texture caches, memory layouts optimized for dimensional locality, and a dedicated hardware pipeline to transform texture coordinates into hardware addresses.
Because the functionality grew organically and was informed by a combination of application requirements and hardware costs, the texturing features are not very orthogonal. For example, the “wrap” and “mirror” texture addressing modes do not work unless the texture coordinates are normalized.
This chapter explains every detail of the texture hardware, as supported by CUDA. We will cover everything from normalized versus unnormalized coordinates to addressing modes to the limits of linear interpolation; 1D, 2D, 3D and layered textures; and how to use these features from both the CUDA runtime and the driver API.
Two Use Cases
In CUDA, there are two significantly different uses for texture.
The first is to simply use texture as a read path: to work around coalescing constraints, to use the texture cache to reduce external bandwidth requirements, or both.
The other use case takes advantage of the fixed-function hardware that the GPU has in place for graphics applications. The texture hardware consists of a configurable pipeline of computation stages that can do all of the following:
- Scale normalized texture coordinates,
- Perform boundary condition computations on the texture coordinates,
- Convert texture coordinates to addresses with 2D or 3D locality,
- Fetch 2, 4 or 8 texture elements for 1D, 2D or 3D textures and linearly interpolate between them, and
- Convert the texture values from integers to unitized floating point values.
Textures are read through texture references that are bound to underlying memory (either CUDA arrays or device memory). The memory is just an unshaped bucket of bits; it is the texture reference that tells the hardware how to deliver the data into registers when a TEX instruction is executed.