Home > Articles > Operating Systems, Server

  • Print
  • + Share This
From the author of

Distributed Software: Divided We Stand!

Parallelism isn't just for scientists anymore. High-volume Web servers and many enterprise data servers employ cluster server architecture. That means that the Web site you are connected to as you read this is probably a team of networked workstations: often a cluster of high-end PCs. Custom software allows each of the huge numbers of incoming packets to be serviced by a next-available processor.

Another reason to use a cluster is for its fault-tolerance properties: its high availability. If a dozen PCs can handle your Web server's peak hit rate, building a 15-PC cluster gives you a 25% margin for error, minus some process management overhead. If one of your dozen machines fails during a peak load condition, your user community will start to get slow response times, and that one failure may propagate to others, cascading into a very bad day. If one of the 15-node cluster PCs fails, a detection process can redirect the load and the responsibilities to one of the 3 spares and send email to a technician about the failure. Your users probably won't even notice.

Fault detection can be of a simple nature. One implementation uses an "I'm okay!" message from the live PC to its backup. This can be once per second, and even over the serial port, to minimize any impact to performance, but half of your PCs are backups in this scheme. Fault detection can be made more efficient by dedicating one PC as a monitor and having each live system send it signed status messages. If status from a live PC isn't seen after some configurable amount of time, the monitor can start an investigation, reboot, or even replace the dead machine. There is a cost to the level of management overhead that must be weighed against the potential risk of failure. More information is available online at http://linux-ha.org.

The cost benefit of cluster computing is legendary. It is so well known that serious scientists are starting to do very creative capital equipment acquisition. One major university has resorted to borrowing our PCs (with our permission, of course) while we're doing other things.

SETI is the Search for Extraterrestrial Intelligence—looking for little green men who are smart enough to let us know of their existence. Walter Sullivan's book We Are Not Alone: The Search for Intelligent Life on Other Worlds (self-published, 1964) popularized the Drake equation that posed the important factors for the development of intelligent life and estimated their coefficients, predicting the probability that intelligent life might exist beyond Earth. A recent TV program interviewed a dozen prominent astronomers who each gave their personal estimates of these coefficients. Though they varied widely in their estimates, the final set of probabilities was sufficiently large to warrant further interest. As Ellie Arroway (portrayed by Jodie Foster) says in the movie Contact (Warner Bros., 1997), "If it's just us, it seems like an awful waste of space."

The University of California at Berkeley (no relationship to Edmund C. Berkeley) started a major SETI project that was badly underfunded. Its response was perhaps one of the more clever instantiations of a cluster computer to date. Millions of home computers are used for only a small fraction of the day, leaving a whole lot of computer power untapped. Most of these computers have an operating system that allows the use of a "screen saver" program that gets control of the computer's resources after some amount of idle time. The UC team developed a screen saver that analyzed a small portion of the data, sending back results and requesting more data when the opportunity presented itself. See http://setiathome.ssl.berkeley.edu/ for references to the latest research and publications.

To quote an astute observer, "Something's happening here, and what it is ain't exactly clear." If you have a network of PCs, and each PC's operating system is aware of the network and its residents, you can run specialized software on each of them that turns them into a cluster server, a parallel processor (a supercomputer), or something of your own concoction, with or without allowance for faults or failures. This specialized software, called middleware, looks like it implements a software-defined computing architecture. That's an interesting concept, in my opinion, and deserves some further research. Have a look at http://www.clustercomputing.org for further details.

  • + Share This
  • 🔖 Save To Your Account