SPECIAL OFFERS
Keep up with new releases and promotions. Sign up to hear from us.
Register your product to gain access to bonus material or receive a coupon.
The #1 guide to clustering for every IT professional!
From Microsoft to IBM, Compaq to Sun to DEC, virtually every large computer company now uses clustering as a key strategy for high-availability, high-performance computing. This book tells you why-and how. It cuts through the marketing hype and techno-religious wars surrounding parallel processing, delivering the practical information you need to purchase, market, plan or design servers and other high-performance computing systems.
With unequalled simplicity, directness, and humor, expert Gregory Pfister delivers all the information you need to make critical strategic decisions. He introduces the primary technologies involved in clustering, and shows why they are becoming so important. He compares clustering with symmetric multiprocessing, demonstrating major differences that are often "papered over." The legendary first edition of this book predicted the cluster revolution now underway. Its refreshing style, candid opinions, and simple explanations made it an underground classic. This new edition adds more than 150 pages of new material, including detailed new coverage of high availability, and Non-Uniform Memory Access (NUMA).
As Microsoft's much-touted "Wolfpack" Cluster Server shows, clustering technology has arrived in the marketplace. Clustering is now a strategic direction for Microsoft, Compaq, IBM, Sun, DEC, Novell, and every other large computer company and their products are rolling out now. This comprehensive, highly-readable guide helps you make sense of clustering in all its forms, not just a single company's offering. Gregory Pfister one of the world's most respected experts on clustering technology delivers all the information you need to make critical strategic decisions. He introduces the primary hardware and software technologies involved in clusters, and shows why they have become popular and will become increasingly important. He presents the background that system planners, purchasers, designers and architects need to make effective use of clustering. He compares different types of clusters and the workloads they are best used for. He presents a detailed comparison of clusters with symmetric multiprocessing -- demonstrating major differences that are often "papered over." The book contains extensive new coverage of availability issues, as well as detailed coverage of Non-Uniform Memory Access (NUMA), the technology at the heart of new offerings from Sequent, HP, Pyramid, NCR and others. "Pfister is a prophet with an attitude..." Norris Parker Smith, HPCWire.
I. WHAT ARE CLUSTERS, AND WHY USE THEM?
1. Introduction.Working Harder. Working Smarter. Getting Help. The Road to Lowly Parallel Processing. A Neglected Paradigm. What is to Come.
2. Examples.Beer & Subpoenas. Serving the Web. The Farm. Fermilab. Other Compute Clusters. Full System Clusters. Cluster Software Products. Basic (Availability) Clusters. Not the End.
3. Why Clusters?The Standard Litany. Why Now? Why Not Now? Commercial Node Performance. The Need for High Availability.
4. Definition, Distinctions, and Initial Comparisons.Definition. Distinction from Parallel Systems. Distinctions from Distributed Systems. Concerning "Single System Image." Other Comparisons. Reactions.
II. HARDWARE.
5. A Cluster Bestiary.Exposed vs. Enclosed. "Glass-House" vs. "Campus-Wide" Cluster. Cluster Hardware Structures. Communication Requirements. Cluster Acceleration Techniques.
6. Symmetric Multiprocessors.What is an SMP? What is a Cache, and Why Is It Necessary? Memory Contention. Cache Coherence. Sequential and Other Consistencies. Input/Output. Summary.
7. NUMA and Friends.UMA, NUMA, NORMA, and CC-NUMA. How CC-NUMA Works. The "N" in CC-NUMA. Software Implications. Other CC-NUMA Implications. Is "NUMA" Inevitable? Great Big CC-NUMA. Simple COMA.
III. SOFTWARE.
8. Workloads.Why Discuss Workloads? Serial: Throughput. Parallel. Amdahl's Law. The Point of All This.
9. Basic Programming Models and Issues.What is a Programming Model? The Sample Problem. Uniprocessor. Shared Memory. Message-Passing. CC-NUMA. SIMD and All That. Importance.
10. Commercial Programming Models.Small N vs. Large N. Small N Programming Models. Large-N I/O Programming Models. Large-N Processor-Memory Models. Shared Disk or not Shared Disk?
11. Single System Image.Single System Image Boundaries. Single System Image Levels. The Application and Subsystem Levels. The Operating System Kernel Levels. Hardware Levels. SSI and System Management.
IV. SYSTEMS.
12. High Availability.What Does "High Availability" Mean? The Basic Idea: Failover. Resources. Failing Over Data. Failing Over Communications. Towards Instant Failover. Failover to Where? Lock Data Reconstruction. Heartbeats, Events, and Failover Processing. System Structure. Related Issues.
13. Symmetric Multiprocessors, "NUMA," and Clusters.Preliminaries. Performance. Cost. High Availability. Other Issues. Partitioning. Conclusion.
14. Why We Need the Concept of Cluster.Benchmarks. Development Directions. Confusion of Issues. The Lure of Large Numbers.
15. Conclusion.Cluster Operating Systems. Exploitation. Standards. Software Pricing. What About 2010?. Coda: The End of Parallel Computer Architecture.
Annotated Bibliography.Preface to the Second Edition
Well, I had to write a second edition. Too much of what I predicted in the first edition became history. You know, it feels good to be able to say that. They even code-named the development project at Microsoft “Wolfpack” after the cover. “Dogpack” or “dogfight” didn't have the right connotations, I guess. Sent me a logo T-shirt, too. I also got some nice e-mail from other developers, but nobody else was that classy. Before I completely dislocate my shoulder patting myself on the back, I should mention that I missed a couple of rather major things the first time around. I didn't foresee the importance of mass-market high availability. I also didn't foresee how much confusing “NUMA” rhetoric would be used. Neither were left out entirely, mind you, but they certainly didn't get anywhere near the attention they either deserve or require. That's been corrected, in the form of two major added chapters, major revisions to other chapters, and scattered revisions throughout the book.
Another major change is the inclusion of information about cluster hardware and software acceleration, a subject that literally did not exist in sufficient quantity to take notice of when the first edition was written. Of course the chapter of examples was trash about 40 seconds before the first edition hit the stands. This edition's version probably will be, too. There has to be a better way to do that part; books can't compete with magazines' rates of publication, much less the Internet. I've tried to be more generic in this edition, but you can't ignore the real systems and do that job right.
However, the basic original structure of the book has stood up adequately, for which I'm grateful; this edition would have been far more work were that not true. As a result, readers of the first edition will probably have an odd sense of reverse deja vu (jamais vu?), like “Hey, I thought I read that before, but it didn't say that.” Believe me, literally every page of this thing has been changed. Why? When the first edition was written, it really was true that most people in the computer industry had not heard of clusters, and those that had mostly considered them a lower form of life. Products that really were clusters weren't called that, because there were much cooler things to claim to be: Massively Parallel. Distributed. Hemidemisemicoupled. Whatever.
Now you would have to be deaf to not have heard of clusters. All God's chilluns got a cluster product, or two, or four, and are talking about them'if that's the phrase'with all the power of their collective lungs. The products are mostly (not always) fairly crude, but, hey, you have to start somewhere. At least they recognize the name. I've had to revise things fairly pervasively to take this new milieu into account, and have also removed some of the ranting about how this might actually be a useful thing to do. Not all. Some.
I'd like to think that the publication of the first edition had something to do with this change in the state of affairs. That would be far more satisfying than merely having correctly nailed a few short-term predictions.
Acknowledgments
I of course remain grateful for IBM's rather enlightened policy towards book authors, which still provides both support for writing and motivation to complete the job. The views expressed here are not necessarily those of the IBM Corporation, of course.
I am also again grateful for the support of my family, who once again put up with my lack of attention while immersed in this project, even though the first edition didn't produce all its promised benefits: my children Danielle and Jonathan, and, of course, my wife, Cyndee Stines Pfister'who originally said that was probably her only chance to get her name in a book. Well, lightning strikes twice.
I also again owe a large debt to the many people who have discussed the subjects of this book with me, both within and outside IBM. I feel privileged to say that the clich remains true: There are far too many to mention all of them individually. However, my manager, IBM Fellow and Vice President Rick Baum, must certainly be thanked for uttering the fateful words, “Don't you think it's time for you to do a second edition?” And then giving me the time to actually do it'possibly more time than he anticipated, and certainly more than I originally estimated, but it did finally get done. Jim Rymarczyk, Dave Elko, and Pete Sargent at least partly repaired my woeful ignorance of Parallel Sysplex. I'm particularly grateful to Dave for the several sessions at which he endured my intemperate questioning. Tom Weaver and Lisa Spainhower provided extremely useful review feedback, as well as much useful discussion over the years. Renato Recio, Tom Chen, Jeff Weiner, and other members of the System Architecture and Performance I/O team are also to be thanked for the times when, hiding behind the clever faade of “we're just dorky I/O guys who don't know nothing about the serious stuff,” they filled in large gaps in my understanding of that Rodney Dangerfield of computing. Not all the gaps, by any means; but I'm better off than I was before. And Bill (“Rocky” ) Rockefeller: May he always keep The System off all our backs.
Once again, however, the people who are most to be thanked are the IBM customers I have met over the years, as well as the members of the IBM field force who brought us together. Those customers gave me the opportunity to find out what was really of importance to them, which of course must be of paramount importance to we who serve them. It appears that many of them really do want to understand the kind of information that is in this book'at least when it is properly explained, as I have tried to do. They also unwittingly provided me with numerous opportunities to try out ways of explaining various topics, and to debug the analogies, metaphors, and jokes in response to questions and quizzical looks. As a direct result, getting through many of the chapters that follow will require markedly less caffeine.
Greg Pfister
August, 1997
Austin, Texas
pfister@us.ibm.com
Preface
Anyone planning to purchase, sell, design, or administer a server or multiuser computer system should buy and read this book.
Key needs of those systems'high performance, an ability to grow, high availability, appropriate cost, and so on'imply the use of parallel processing: multiple computing elements used together as a single entity. Parallel processing, with a bit of distributed processing, is what this book is about; it will give you the background needed to understand where the real issues lie in that realm. However, this doesn't mean that this book discusses “highly” or “massively” parallel computers. Those are flamboyant enough to have already attracted a multitude of variably successful explanations and are really of direct interest only in a vanishingly small fraction of the computer market.
Instead, this book uniquely discusses both the hardware and the software of “lowly” parallel computers, the everyday, practical work gangs of computing: symmetric multiprocessors, so-called “NUMA” systems, and, in particular, clusters of computers.
You do not have to be a died-in-the-denim “techie” to enjoy and profit from this book. Its form and content reflect the author's experience in explaining these issues quite literally hundreds of times to people with at best a semi-technical computer background. This has included customers who have better things to do with their time than become computer technophiles; marketing reps, both the technically oriented and the Jag-driv