10.10 3D Texturing
Reading from 3D textures is similar to reading from 2D textures, but there are more limitations:
- 3D textures have smaller limits (2048×2048×2048 instead of 65536×32768).
- There are no copy avoidance strategies: CUDA does not support 3D texturing from device memory or surface load/store on 3D CUDA arrays.
Other than that, the differences are straightforward: kernels can read from 3D textures using a tex3D() intrinsic that takes 3 floating point parameters, and the underlying 3D CUDA arrays must be populated by 3D memcpy’s. Trilinear filtering is supported; 8 texture elements are read and interpolated according to the texture coordinates, with the same 9-bit precision limit as 1D and 2D texturing.
The 3D texture size limits may be queried by calling cuDeviceGetAttribute() with CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_WIDTH, CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_HEIGHT, and CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_DEPTH, or by calling cudaGetDeviceProperties() and examining cudaDeviceProp.maxTexture3D.
Due to the much larger number of parameters needed, 3D CUDA arrays must be created and manipulated using a different set of APIs than 1D or 2D CUDA arrays.
To create a 3D CUDA array, the cudaMalloc3DArray() function takes a cudaExtent structure instead of width and height parameters:
cudaError_t cudaMalloc3DArray(struct cudaArray** array, const struct cudaChannelFormatDesc* desc, struct cudaExtent extent, unsigned int flags __dv(0));
cudaExtent is defined as follows:
struct cudaExtent { size_t width; size_t height; size_t depth; };
Describing 3D memcpy operations is sufficiently complicated that both the CUDA runtime and the driver API use structures to do so. The runtime API uses the cudaMemcpy3DParams structure, which is declared as follows:
struct cudaMemcpy3DParms { struct cudaArray *srcArray; struct cudaPos srcPos; struct cudaPitchedPtr srcPtr; struct cudaArray *dstArray; struct cudaPos dstPos; struct cudaPitchedPtr dstPtr; struct cudaExtent extent; enum cudaMemcpyKind kind; };
Most of these structure members are themselves structures: extent gives the width, height and depth of the copy. srcPos and dstPos are cudaPos structures are used to specify the start points for the source and destination of the copy.
struct cudaPos { size_t x; size_t y; size_t z; };
The cudaPitchedPtr is a structure that was added with 3D memcpy to contain a pointer/pitch tuple:
struct cudaPitchedPtr { void *ptr; /**< Pointer to allocated memory */ size_t pitch; /**< Pitch of allocated memory in bytes */ size_t xsize; /**< Logical width of allocation in elements */ size_t ysize; /**< Logical height of allocation in elements */ };
A cudaPitchedPtr structure may be created with the function make_cudaPitchedPtr, which takes the base pointer, pitch, and logical width and height of the allocation. make_cudaPitchedPtr just copies its parameters into the output struct, however:
struct cudaPitchedPtr make_cudaPitchedPtr(void *d, size_t p, size_t xsz, size_t ysz) { struct cudaPitchedPtr s; s.ptr = d; s.pitch = p; s.xsize = xsz; s.ysize = ysz; return s; }
The simpleTexture3D sample in the SDK illustrates how to do 3D texturing with CUDA.