Clusters Aren't Just for High Performance Anymore
The loud hum of the air conditioning breathes through the massive data center as thousands of computers lie still in the darkness. The single green eye from the power switch illuminates racks upon racks of servers as far as the eye can see. Evocations of the giant machine are immediately brought to attention as the almost haunting image of a single entity bears down upon the onlooker.
Okay, so even though it's not really such an ominous entity, the high performance cluster is arguably the most popular type known to the masses. Vector computing, a type of high-performance machine, tends to be cost prohibitive for commodity use due to the specialized hardware that it requires. Parallel computers, on the other hand, have become immensely popular due to the fact that almost anyone can build one with spare parts. The name Beowulf is almost as synonymous with parallel clustering as Jello is with flavored gelatin. Parallel computers have grown tremendously in popularity as researchers and hobbyists alike are able to mirror supercomputer performance. Of course, clustering doesn't begin and end with the parallel or Vector computer.
So now that you have a basic understanding of the different types of clusters that we discuss in this book, I'm sure you're asking, "Yes, that's all well and good, but what does it all mean?" For that, you should consult your favorite religious text. For the person who is comfortable with his or her beliefs and wants to get in to the business of setting up Linux clusters, Chapter 2, "Preparing your Linux Cluster," would be perfect. For the curious individual who wants to understand the theories behind clustering technologies, the rest of this chapter is for you.
Making two or more Linux machines interact with each other is quite easy. The simplest way is to assign each machine an IP address and a subnet mask, attach a crossover cable between them, and voilainstant network. Things start to get more complicated when there's a few more machines involved, although it's still not rocket science. Add another computer, and you have to add a switch or hub. Connect your computer to another network, and you have to add a router of some kind. Getting clustered computers to talk to each other at the most basic level is as simple as setting up a network. However, getting them to interact in different ways is another story.
High Availability and Fault-Tolerant Clusters
Computers have an annoying tendency to break down when you least expect. It's a rare find to come across a system administrator that hasn't received a phone call in the middle of the night with the dreaded news that a critical system is down, and would you please attend to this situation at your own convenience (right now!).
The concept of highly available and fault-tolerant clusters tend to go hand in hand. If a system is going to achieve high uptimes, the more redundant subsystems it needs to remain operatingsuch as the addition of servers in a clustered configuration. The bottom line for high availability clusters is that the application is of such high importance that you take extra steps to make sure it's available.
Single Point of Failure
The single point of failure (SPOF) is a common theme in HA clusters. Having a single component is just asking for trouble, and the SPOF is a paradigm to be avoided at all costs. In a perfect world, each server would have a redundant subsystem for each component in case the primary died or stopped responding. Critical subsystems that usually fail include hard drives, power supplies, and network cards. It's a sad fact of life, but user error tends to account for most downtime. Operators have been known to unmount volumes of critical live data, and contractors reorganize cables that are poorly labeled.
Such redundancy planning might include adding a RAID controller in case of hard drive failure, but what if the controller died? A whole different set of controllers and RAID devices could be implemented to reduce the risk of the SPOF. An architecture would have to be considered that would allow for hot-swappable CPUs. Two network cards could be implemented in case connectivity problems become an issue. A network card could be tied to a different switch or router for backup. What then for the network itself? What if a switch went bad? A second switch then could be set in place in case the first one died, and then a redundant router, redundant network provider might all be considered. It's a fine line between handling an SPOF and total redundancy of all systems, where budget is usually the deciding factor.
Achieving 100 percent uptime is near impossible, although with the right technologies, you can come quite close to that. A more realistic goal to set when planning a HA solution is providing for an uptime of 99 percent or higher. In doing so, you can plan scheduled downtime for maintenance, backups, and patches requiring reboots. Having an uptime requirement of greater than 99 percent requires different models, such as redundant systems.
Although there are servers designed with redundancy in mind, their prices tend to be much larger than SPOF servers. There are companies that develop servers with no SPOF, including two motherboards, two CPUs, two power supplies, and so on. These are even more expensive, but for the organization that can't afford downtime, the cost to data integrity ratio evens out.
Fortunately, there is another way to achieve such redundancy without the high cost of redesigning single server architecture. Incorporating two or more computers achieves a second layer of redundancy for each component. There are just about as many ways to achieve this redundancy as there are people implementing servers. A backup server that can be put into production is among the most common methods that are currently implemented. Although this works in most environments, offline redundant servers take time to prepare and bring online. Having an online failover server, although more difficult to implement, can be brought up almost immediately to replace the initial application.
In Figure 1.1, two computers are connected to the same network. A heartbeat application that runs between them assures each computer that the other is up and running in good health. If the secondary computer cannot determine the health of the primary computer, that computer takes over the services of the primary. This is typically done with IP aliasing and a floating address that's assigned with DNS. This floating address will fail over to the secondary computer as soon as the heartbeat detects an event such as the primary server or application failing. This method works well if all the served data is static. If the data is dynamic, a method to keep the data synched needs to be implemented.
Shared storage is the method where two or more computers have access to the same data from one or more file systems. Typically, this could be done through a means of shared SCSI device, storage area network, or a network file system. This would enable two or more computers to access the data from the same device, although having the data on only one device or file system could be considered an SPOF.
Managing Shared Storage and Dynamic Data
It's no surprise to any of us that today's mission-critical servers need access to data that's only milliseconds old. This isn't a problem when you have one server and one storage medium. Databases and web servers can easily gain access to their own file systems and make changes as necessary. When data doesn't change, it's not a problem to have two distinct servers with their own file systems. But how often do you have a database of nothing but static data? Enter shared storage and another server to avoid the SPOF. Yet that scenario opens up a new can of worms. How do two servers access the same data without the fear of destroying the data?
Figure 1.1 Load balancing explained.
Let's examine this scenario more closely. Imagine if you will, employees at XYZ Corporation who install a database server in a highly available environment. They do this by connecting two servers to an external file storage device. Server A is the primary, while server B stands by idle in case of failover.
Server A is happily chugging away when Joe Operator just so happens to stumble clumsily through the heartbeat cables that keep the servers synched. Server B detects that server A is down and tries to gain control of the data while the primary continues to write. In "high availability speech," this is known as a split-brain situation. Voilaan instant recipe for destroyed data.
Fencing is the technology used to avoid split-brain scenarios and aids in the segregation of resources. The HA software will attempt to fence off the downed server from accessing the data until it is restored again, typically by using SCSI reservations. However, the technology has not been foolproof. With advances in the Linux kernel (specifically 2.2.16 and above), servers can now effectively enact fencing and, therefore, share file systems without too many hassles.
Although shared storage is a feasible solution to many dynamic data requirements, there are different approaches to handling the same data. If you can afford the network delay, a solution that relies on NFS, or perhaps rsync, could be a much cleaner solution. Shared storage, although it has come a long way, adds another layer of complexity and another component that could potentially go wrong.
Load balancing refers to the method in which data is distributed across more than one server. Almost any parallel or distributed application can benefit from load balancing. Web servers are typically the most profitable, and therefore, the most used application of load balancing. Typically in the Linux environment, as in most heterogeneous environments, this is handled by one master node. Data is managed by the master node and is served onto two or more machines depending on traffic (see Figure 1.2). The data does not have to be distributed equally, of course. If you have one server on gigabit ethernet, that server can obviously absorb more traffic than a simple, fast ethernet node can.
Figure 1.2 Load balancing 101.
One advantage of load balancing is that the servers don't have to be local. Quite often, web requests from one part of the country are routed to a more convenient location rather than a centralized repository. Requests made from a user are generally encapsulated into a user session, meaning that all data will be redirected to the same server and not redirected to others depending on the load. Load balancers also typically handle failover by redirecting the traffic from the downed node and spreading the data across the remaining nodes.
The major disadvantage of load balancing is that the data has to remain consistent and available across all the servers, though one could use a method such as rsync to keep the integrity of data.
Although load balancing is typically done in larger ISPs by hardware devices, the versatility of Linux also shines here.
Programs such as Balance and the global load balancing Eddie Mission are discussed in Chapter 6, "Load Balancing."
Take over three million users, tell them that they too can join in the search for extraterrestrial life, and what do you get? You get the world's largest distributed application, the SETI@home project (http://setiathome.ssl.berkeley.edu/). Over 630,000 years of computational time has been accumulated by the project. This is a great reflection on the power of distributed computing and the Internet.
Distributed computing, in a nutshell, takes a program and assigns computational cycles to one or more computers and then reassembles the result after a certain period of time. Although close in scope to parallel computers, a true distributed environment differs in how processes and memory are distributed. Typically, a distributed cluster is comprised of heterogeneous computers, which can be dedicated servers but are typically end-user workstations. These workstations can be called on to provide computational functionality using spare CPU cycles. A distributed application will normally suspend itself or run in the background while a user is actively working at his computer and pick up again after a certain timeout value. Distributed computing resources can easily be applied over a large geographical area, constrained only by the network itself.
Just about any computationally expensive application can benefit from distributed computing. SETI@home, Render Farms (a cluster specifically set up to harness processing power to do large scale animations), take over three million users, tell them that they too can join in the search for extraterrestrial life, and what do you get? And different types of simulations can all benefit.
Parallel computing refers to the submission of jobs or processes over more than one processor. Parallel clusters are typically groups of machines that are dedicated to sharing resources. The cluster can be built with as little as two computers; however, with the price of commodity hardware these days, it's not hard to find clusters with as little as 16 nodes or perhaps several thousand. Google has been reported to have over 8,000 nodes within its Linux cluster.
Parallel clusters have also been referred to as Beowulf clusters. Although not technically accurate for all types of HPCs, a Beowulf type cluster refers to "computer clusters built using primarily commodity components and running an Open Source operating system."
Parallel computers work by splitting up jobs and doling them out to different nodes within the cluster. Having several computers working on a single task tends to be more efficient than one computer churning away on the same task, but having a 16-node cluster won't necessarily speed up your application 16 times. Parallel clusters aren't typically "set up and forget about them" machines; they need a great amount of performance tuning to make them work well. Once you've tuned your system, it's time to tune it again. And then after you've finished that, it might benefit from some more performance tuning.
Not every program benefits from a parallel configuration. Several factors must be considered to judge accurately. For example, is the code written to work under several processors? Most applications aren't designed to take advantage of multiple processors, so doling out pieces of your program would just be futile. Is the code optimized? The code might work faster on one node instead of splitting up each part and transferring it to each CPU. Parallel computing is really designed to handle math-intensive projects, such as plotting the expansion of the universe, render CPU-intensive animations, or decide exactly how many licks it actually takes to get to the center.
The parallel cluster can be set up with a master node that passes jobs to slave nodes. A master node will generally be the only machine that most users see; it shields the rest of the network behind it. The master node can be used to schedule jobs and monitor processes, while the slave nodes remain untouched (except for maintenance or repair). In a high production environment with several thousand systems, it might be cheaper to totally replace a downed node than to diagnose the error and replace faulty parts within the node.
Although several methods have been designed for building parallel clusters, Parallel Virtual Machine (PVM) was among the first programs to allow code to run across several nodes. PVM allows a heterogeneous group of machines to run C, C++, and Fortran across the cluster. Message Passing Interface (MPI), a relative newcomer to the scene, is shaping up to be the standard in message passing. Each is available for almost every system imaginable, including Linux.
How Now, Brown Cow
Included within the Parallel Processing model exist cluster configurations called NOWs and COWsor even POPs. Generally, the concept of a NOW and a COW can be synonymous with any parallel computer; however, there tends to be some disagreement among the hardcore ranks. A NOW is a network of operating systems, the COW is a cluster of operating systems, and a POP is a Pile of PCs (Engineering a Beowulf Style Computer Cluster, http://www.phy.duke.edu/brahma/beowulf_book.pdf). If one takes the approach that a parallel cluster is comprised of machines dedicated to the task of parallel computing, then neither a NOW nor a COW fit the bill. These typically are representative of distributed computing because they can be comprised of heterogeneous operating systems and workstations.
From the earliest days of computing, people would stare at their monitors with utter impatience as their systems attempted to solve problems at mind-bogglingly slow speeds (okay, even with modern technology, we still haven't solved this problem). In 1967, Gene Amdahl, who was working for IBM at the time, theorized that there was a limit to the effectiveness of parallel processing for any particular task. (One could assume that this theory was written while waiting for his computer to boot.) More specifically, "every algorithm has a sequential part that ultimately limits the speedup that can be achieved by a multiprocessor implementation" (Reevaluating Amdahl's Law by John L. Gustafson; http://www.scl.ameslab.gov/Publications/AmdahlsLaw/Amdahls.html). In other words, there lies certain parts to each computation, such as the time it takes to write results to disk, I/O limitations, and so on.
Amdahl's Law is important in that it displays how unreasonable it is to expect certain gains above a typical threshold. Even though Amdahl's Law remains the standard in demonstrating the effectiveness of parallel applications, the law remains in dispute and even begins to fall apart as larger clusters are becoming commonplace. According to John L. Gustafson in Reevaluating Amdahl's Law:
"Our work to date shows that it is not an insurmountable task to extract very high efficiency from a massively-parallel ensemble. We feel that it is important for the computing research community to overcome the 'mental block' against massive parallelism imposed by a misuse of Amdahl's speedup formula; speedup should be measured by scaling the problem to the number of processors, not fixing problem size." (See On Microprocessors, Memory Hierarchies, and Amdahl's Law; http://www.hpcmo.hpc.mil/Htdocs/UGC/UGC99/papers/alg1/).
Although certain people dispute the validity of Amdahl's Law, it remains an easy way to think about those limitations.