Linux Filesystem Futures
- Filesystem Support for Disconnected Operation
- Filesystems for Embedded Devices
- Indexed Filesystems
- Higher Performance through Disk-Level Access
- Summary
The future of filesystems on Linux is certainly not limited to the journaling and distributed filesystems discussed in earlier articles in this series. The open nature of Linux, and the openness and elegance of the Virtual Filesystem Switch (VFS) mechanism used to plug filesystems into the Linux kernel means that there is a tremendous number of exciting developments in Linux filesystems underway at all times. This article highlights some of the classes of filesystem development and specific new filesystem development projects that I have found most useful, interesting, and enlightening.
Filesystem Support for Disconnected Operation
As introduced in the previous article in this series on distributed filesystems, today's omnipresent networks have made accessing filesystems over a network a standard part of any computing environment. This dependency on file and data storage that people can only access over a network raises some interesting issues for laptop and mobile users who still need to access mandatory data, even when they may not be directly connected to the network. This is known as disconnected operation because the system needs to be able to function even when resources that it typically expects to use (such as user data) are not available in the standard fashion. Even a system such as Windows provides integrated GUI and desktop features for marking files that you want to work with when you're not connected to the network, and for synchronizing those files when you reconnect.
Luckily, there are two distributed filesystems, Coda and InterMezzo, which are currently available for Linux and are largely designed to automatically support offline operation. (Work is also being done to provide this capability in NFS filesystems.) Interestingly enough, the source code for both Coda and InterMezzo is already available in the mainline Linux development kernel. InterMezzo has been available in the kernel only since version 2.4.5 or so, whereas Coda appeared even earlier. This is somewhat ironic because OpenAFS, which is much more widely used, is not currently in the Linux kernel, although this is probably because OpenAFS has been an open source GPL project only since 2000, whereas Coda and InterMezzo have always been open source.
Coda is a distributed filesystem with its origin in AFS v2. It has been under development at Carnegie Mellon University since 1987. Coda shows its AFS roots in its support for persistent, client-side caching that helps minimize client restart times and server replication, while providing for effective scalability. The lack of support for disconnected operation has always been one of the biggest shortcomings of AFS (and now OpenAFS), and so this is one of the primary focus areas for Coda. (Coda also has many other powerful features, primary among which is that it is less intrusive in terms of administrative and operational configuration than AFS and OpenAFS. As an academic project, the source code and binaries for Coda are freely available under a liberal license (which is primarily GPL), but portions (the libraries) are Lesser GPL (LGPL). For more information about Coda, see http://www.coda.cs.cmu.edu/.
InterMezzo is a relatively new distributed filesystem with a focus on high availability, flexible replication of directories, disconnected operation, and a persistent cache. Needless to say, InterMezzo is an open source project that is available under the GPL. InterMezzo was inspired by CMU's Coda, but is not based on the Coda source code. Interestingly enough, some of the higher-level InterMezzo features are implemented in Perl, which is a rarity for code that essentially works at the operating-system level. This makes InterMezzo much easier to understand (and enhance), although performance can be an issue. The father of InterMezzo, Peter Braam, was the head of the Coda project at CMU for several years before moving on with InterMezzo and other advanced computing projects.
On both clients and servers, InterMezzo uses an underlying journaled filesystem that stores information about local filesystem changes in a log stored within an ext2 filesystem. Rather than providing yet another journaling filesystem (YAJF), InterMezzo relies on a set of wrapper functions that enable it to take advantage of existing journaling filesystems such as ext3, JFS, and JFS.
Though Coda is powerful and stable, I find the InterMezzo filesystem to be one of the more exciting up-and-coming filesystems for Linux, largely because it can take advantage of years of conceptual development in AFS and Coda without inheriting a large, aging code-base. For more information about InterMezzo, see http://www.inter-mezzo.org. (Make sure that you type the hyphen, or you'll end up on a Dutch site that I don't fully understand.) For additional information about InterMezzo, see the links in the article five in this series, "Getting More Information About Linux Filesystems."