- Integration Factors to Consider
- The Linux Solution
- File Services
- Print Services
- Edge Services
- DNS/DHCP Servers and Routing
- Web Servers
- Workgroup Databases
- Light Application Servers
- Computation Clusters
- Data Center Infrastructure
- Enterprise Applications
- Messaging and Collaboration
- Internal Development
- Power Workstations
Linux clustering capabilities using Beowulf have previously been mentioned and the advantages have been outlined. This section covers in more detail what clustering can be used for, how it works, and how it can be configured. Clustering for high availability as well as high-performance computing is providing companies, large and small, peace of mind and solidly reliable IT services, as well as supercomputing capabilities for a fraction of the traditional cost.
Linux clustering can be useful to meet both high availability and high-performance needs. Examples of high availability include web and e-commerce geo-site redundancies, high-demand application failover contingencies, distributed applications, and load balancing. High-performance cluster candidates include finite element analysis, bioscience modeling, animation rendering, seismic analysis, and weather prediction.
IDC estimated that 25% of all high-performance computing shipments in 2003 were clusters. It's an area that is growing at a considerable rate. Organizations that are using clustering to create high-performance solutions include Google (10,000 node cluster), Overstock.com (Oracle database for product tracking), Burlington Coat Factory (IBM PolyServe clustering), Epiphany, and many more.
Here's how high availability clusters workstarting with a simple two-node cluster and building from there. Assume that the services you want to ensure are "always available" are email, NFS, and a web server. For a two-node cluster, you will have at least two serverseach with a storage drive and identical copies of each of the services installed (mail, NFS, web). These servers are connected together with two to three connections. The first connection is a dedicated serial cable that supports the Heartbeat service, which constantly monitors the status of each service (mail, NFS, web). If Heartbeat detects that one or more of the services has failed, it immediately starts the same service(s) on the second server.
The second connection between the two clustered servers is a dedicated data connection for keeping drives on both systems mirrored and in sync. 100MB or gigabyte connections are recommended here, especially if your services are data intensive and often changing. If a service suddenly switches to the second machine, the data that it may require will be there and ready. This high-speed data connection could also service the Heartbeat. Assuming these services (mail, NFS, web) are for web clients, there will be a third connection that links the servers to the network or the Internet. To play it safe, you should have an uninterruptible power supply (UPS) for each server that will be attached via UPS control cables.
Heartbeat and the application that keeps data storage in sync, DRDB, are both available with SUSE Linux and can be installed and managed through YaST. They are also available with other Linux distributions or from popular Linux sites. You have the options of configuring failover to occur immediately upon fault detection, initiating a manual failover, or specifying that failover operations occur in a specific order according to rules based on resource priority or system availability (see Figure 3.12).
Figure 3.12 Simple two-node cluster for high-availability failover.
Now that you understand cluster basics, you can mix and match, adding servers and services to create any configuration that meets your needs. High-availability clusters can range from simple to very complex. You don't have to mirror the exact services on each machine. You can specify that one service on machine A fails over to machine B, and another service on machine A fails over to machine C. If you don't want the expense of duplicate hardware for every set of services, you could configure one machine to be the failover recipient for several other machines. The chances are slim that two or three machines would fail at once, overloading the target server. You could also have active services running on several servers (no idle failover machines) with failover to other servers in the cluster running other services.
Clustering provides a lot of flexibility for storage configuration. Instead of failing from drives on one machine to drives on another, you could create a storage array with combinations of RAID, mirroring or striping. These storage subsystems could also be configured to fail over to other systems, if that level of redundancy is needed. The possibilities are endless, and with the Internet and technologies such as iSCSI, a cluster can be geographically distributed.
If you've ever wondered what an extra "9" of availability gives you, here's a summary chart of how often you would be down in one year given a specific level of reliability. With high availability clusters on Linux, it's easy to reach five 9s.
1 90.0000% 37 days
2 99.0000% 3.7 days
3 99.9000% 8.8 hours
4 99.9900% 53 minutes
5 99.9990% 5.3 minutes
6 99.9999% 32 seconds
When it comes to high-performance computing on Linux, each high-performance cluster consists of one master and as many slave nodes as needed. Some of today's largest clusters have over 10,000 nodes. All nodes should be the same architecture (Intel, Apple, and so on) and for optimal performance, the hardware configurations should be identical (a slow node can slow down the entire cluster). Because the master node performs management functions, more RAM and faster network, processor, and disk speeds are highly recommended for better performance.
Linux is installed on every node and the application to be run on the cluster is installed on the master. Applications generally must be parallelized (written to take advantage of multiple processors) before computation can be spread across multiple computers for high-performance results. The exception is a serial application that is run repeatedly on different data sets.
The clustering software, which could be Beowulf or any other open source or commercial version, is also installed on the master node and every node in the cluster. The clustering software for high performance includes message-passing libraries that facilitate high-speed communication between nodes. Effective high-performance clusters also require high-speed connections between nodes. This can be provided using several methods, such as Ethernet, Gigabit Ethernet, or one of the commercial high-bandwidth, low-latency interconnect systems from vendors such as Myrinet, Infiniband, or Quadrics (see Figure 3.13).
Figure 3.13 High-performance clusters consist of a master, slave nodes, and a high-speed switch.
With Linux, both high-availability and high-performance clusters can be created without added expense. The Heartbeat and Beowulf solutions are open source, and are included with the Novell SUSE Linux distribution. Novell gives you clustering right out of the box for either SUSE Linux or NetWare with basic versions of Open Enterprise Server. A Novell advantage is that Linux nodes can fail over to NetWare nodes and vice versa. A two-node cluster license is included that allows you to create mirrored or failover systems to ensure that data or applications are always available.
A major advantage to Linux clusters is that they are incrementally scalable. It doesn't take a complete redesign of an application or buying a new supercomputer to get more horsepower. You just augment the cluster by one, two, five, or 50 machines or more, depending on what you need to get the job done.