Improvements in Linux Filesystems
Most Linux and Unix filesystems are physically organized using data structures that largely mirror the concept of files and directories. The primary data structure in a Unix filesystem is the index node, or inode. One inode is associated with each file and directory in a filesystem, and it provides important information about that file or directory, such as when it was created and last accessed, and (most importantly) where the blocks containing the data for that file or directory actually are located on the disk. Inodes are numbered sequentially in a filesystem when they are allocated.
When you create a classic Unix filesystem, all of the logical blocks in the filesystem are formatted, and a specific number of inodes are created in anticipation of the number of files and directories that might eventually be created in the filesystem. The number of inodes that are precreated is usually either automatically calculated based on the size of the filesystem, or can be specified as a command-line argument when creating the filesystem. The standard Linux is the ext2 filesystem, which is generally organized along the lines of most Unix filesystems, though it incorporated many enhancements to improve performance and reliability.
Preallocating inodes when a filesystem is created is convenient, and minimizes the amount of work that a filesystem has to do when you actually create a file or directory. All the system has to do is to grab an unallocated inode off the chain of available inodes and fill in metadata such as the file type, owner, group, protection mode, and so on. On the other hand, preallocating inodes imposes some limitations on the future use of the filesystem. For example, it places an empirical limit on the number of files and directories that you can create in the filesystem.
One of the most interesting areas of innovation in Linux filesystems is the way in which those filesystems allocate storage to files as the size of the file changes. Classic Unix filesystems allocate additional storage by adding blocks to the list of unallocated blocks in the filesystem, known as the free list. This term hearkens back to ancient Unix file systems, in which the free list was literally maintained as a linked list of unallocated blocks. In most current filesystems that allocate disk space as logical blocks, the free list is actually stored and maintained as a bitmap in which each bit corresponds to a logical block in the filesystem.
Incrementally adding logical blocks to a file as its size has a few unfortunate side effects. First, selecting the "right" blocks to add to a specific file imposes some additional heuristic overhead during the allocation process. As you allocate new storage to a file, you want to maximize the locality of the newly allocated blocks. This typically either means that you want to allocate blocks that are contiguous to other blocks already associated with the file, or that you want to allocate blocks elsewhere that can be read as quickly as possible when the file is being read sequentially. The goal in both of these cases is to improve performance by minimizing the amount of delay due to moving the disk head to find the data in a file.
The second problem associated with allocating storage as individual disk blocks has to do with file size. Today's computer systems frequently read and write single data files that are larger than the disk drives found in systems a few years ago. Huge files require huge amounts of storage, which translates to incredibly large lists of blocks if storage is tracked on a per-block basic.
As an attempt to solve these and similar problems, many modern filesystems typically allocate storage in terms of extents, which are regions of contiguous logical blocks that can simply be defined by the initial block number and the number of sequential blocks in the extent. To expedite random access in large files, extent descriptions usually also contain a third field, which identifies the offset into the file that the first byte in the extent occupies in the file. You can therefore quickly read the data located at a specific offset in a file by scanning the extent descriptors to locate the extent that contains the offset you're looking or, rather than having to calculate which block to read by using the logical block size and walking the list of allocated blocks appropriately. The Linux ext2 filesystem allocates extents whenever physically sequential blocks are available, trying to anticipate the growth of files as they are created, as do the journaling filesystems discussed in subsequent articles in this series.