Distributed Filesystems for Linux
The previous two articles in this series highlighted a variety of improvements that have been made to filesystems that are physically attached to computer systems, which are generally referred to as local filesystems. Today's ubiquitous networks have popularized another type of filesystema filesystem that is exported from specific machines on the network (known generically as file servers), and that can be mounted and accessed by any system on the network with the proper authentication credentials.
The Network Is the Filesystem, or Something Like That
Networked filesystems are filesystems in which the data for that filesystem is actually stored on file servers and other systems, and can therefore only be accessed over a network. These are also often known as distributed filesystems (the term I prefer), because they enable you to access storage and other resources distributed across other hosts on your network.
All distributed filesystems are intrinsically based on the client/server model in order to differentiate between the systems that permanently store data and the systems that simply use that data. Servers are computer systems or processes that provide specialized services to other machines or processes, which are therefore known as clients. Distributed filesystems provide users with the freedom to access their information from any client system on which they can log in and have access to the resources provided by a server over the network.
Distributed filesystems offer many advantages over local filesystems, and can provide substantial benefits throughout your computing infrastructure. Some of the ways in which distributed filesystems can improve the usability and manageability of your computing environment include the following:
Distributed filesystems reduce the chance that the failure of a single workstation will prevent you from accessing your data. Most networked filesystems enable you to log in on any machine on which they are available, and access your data in exactly the same way.
Distributed filesystems reduce the chance that a single hardware failure will prevent you from accessing your data by providing copies of a single network filesystem on multiple file servers. (Unless it's a network failure!)
Distributed filesystems provide central locations for data that must or should be shared by, or available to, all users.
Distributed filesystems simplify access to existing data from faster systems. When using a distributed filesystem to store user data, upgrading desktop systems is much simpler because upgrading individual workstations requires less emphasis on preserving local data. If all or the majority of your user data is on a centralized file server, upgrading a desktop is as simple as swapping in the new hardware. Similarly, if you have applications that you need to run on faster machines, you can simply connect to them and run the application, accessing centralized data over the network.
Distributed filesystems centralize administrative operations such as backups.
Distributed filesystems promote interoperability and flexibility. They make it easy for people to use the software and hardware that is best-suited to their requirements because their data is not local to any specific machine except the centralized file server.
The following are some common usage scenarios for distributed filesystems:
Storing the home directories for all your users so that they can log in on any workstation within your enterprise, and still access their files over the network.
Storing the binaries for all non-operating-system binaries, and mounting the appropriate directory of executables on each client workstation. This enables you to make these applications available to all of your users from a single central location. Thus, it becomes substantially easier to update applications and related libraries because you only need to update them in one place.
Storing the directories used by centralized services such as electronic mail, newsgroups, and bulletin boards. This simplifies accessing these resources form any client system, eliminates the need for duplicating them, and reduces the local disk space on your client systems that these services would otherwise require.
The appropriate combination of journaling and distributed filesystems available for Linux today provide users and administrators with an amazing array of opportunities. It's exciting to find system software solutions that help maximize computer uptime, performance, and the availability of user data. At the same time, using distributed and journaling filesystems can expedite common operational tasks (such as backups), and minimize the impact of individual hardware failures. Although it's true that more advanced local and networked filesystems often require additional administrative overhead and time during installation, it's hard to think of a similar investment (beyond using Linux in the first place) with such clear and immediate payback.
The next sections provide an overview of the two distributed filesystems that are in most common use todaysummarizing their features and benefits, and exploring the primary differences between the two.