ReiserFS: A Pioneering Mobile Target
The ReiserFS journaling filesystem is unique in many respects. Not only was it the first journaling filesystem to be publicly available for Linux, but it was also the first journaling filesystem to be accepted and integrated into the Linux kernel source code. Partially because of this, but also due to its excellent overall performance, the ReiserFS journaling filesystem is the filesystem of choice for the SuSE Linux distribution, an installation option for the Mandrake Linux distribution, and (because it is in most modern kernels) is available on most other Linux distributions.
ReiserFS began life as a personal project of Hans Reiser, a colorful and opinionated German Ph.D who still directs its development. The development of ReiserFS has been helped along by significant investments from Linux software and hardware vendors. The staunchest supporter of ReiserFS (outside its core development team) has always been SuSE, the German Linux distribution, which was the first Linux vendor to bundle and ship a Linux distribution that contained and actively promoted ReiserFS. It's easy to see why a Linux vendor such as SuSE (http://www.suse.com) would invest in the ReiserFS project because having a high-performance journaling filesystem in its Linux distribution makes it easier to sell Linux as an enterprise computing solution.
Aside from its basic journaling functionality, ReiserFS has some interesting features that help differentiate it from other journaling filesystems for Linux. ReiserFS uses a mechanism called tail-packing that enables it to combine small files and file fragments (portions of a file that are less than a full block in size) and store them directly in the B+Tree nodes. This decreases "wasted space" in the filesystem due to filesystem blocks that contain substantially less than the ReiserFS block size. The ReiserFS also uses B* balanced trees to organize directories, files, and data. This provides fast directory lookups, intrinsic support for large directories, and blazingly fast delete operations.
ReiserFS is designed to support block sizes from 512 to 64KB bytes, but currently only supports 4KB blocks. Its efficiency in packing fragments into other blocks helps overcome this limitation. In fact, migrating an existing ext2 filesystem to ReiserFS produces some impressive resultsit's great to copy an existing filesystem onto a new ReiserFS filesystem and find that your files take up less space than they used to.
Like other modern filesystems, such as XFS and JFS, ReiserFS supports sparse files and does dynamic inode allocation, which minimizes the amount of filesystem storage preallocated to filesystem data structures. ReiserFS uses block-based allocation instead of using extents like JFS and XFS, and therefore tracks allocated blocks in a filesystem using bitmaps, like traditional Unix filesystems.
ReiserFS journals filesystem metadata updates rather than both data changes and metadata updates. This helps guarantee the consistency of the filesystem itself after a system restart, although the data that the files contain may be out-of-date or partially updated, depending on whether the data writes to the file completed before the system went down. The journaling support in ReiserFS uses some clever strategies to maximize metadata consistency, even in the event of a sudden system failure. For example, when updating filesystem metadata, ReiserFS does not overwrite the existing metadata, but instead writes it to a new location as close as possible to the existing metadata. If a system goes down while a metadata update is taking place, this guarantees the consistency of the filesystem's existing metadata because it is not freed until the update transaction is completed.
To summarize, ReiserFS is a high-power journaling filesystem built into every version of Linux running a 2.4.1 or greater kernel. It is already extensively used on Linux systems, and therefore has some substantial history in Linux terms. ReiserFS isn't perfectfor example, its use of balanced trees causes performance problems when creating new files on filesystems that are more than 90 percent full, due to the overhead of shifting a huge number of nodes when balancing the trees. Similarly, copying large numbers of huge files (MP3, anyone?) to a ReiserFS partition can take a while as the system juggles storage allocation to keep the internal B*Tree as shallow as possible. On the upside, it's in every modern Linux system, is used extensively, and makes it easy for you to use a journaling filesystem on any recent Linux distribution.