Designing Enterprise Solutions with Sun Cluster 3.0 is an introduction to architecting highly available systems with Sun servers, storage, and the Sun Cluster 3.0 software. Three recurring themes are used throughout the book: failures, synchronization, and arbitration. These themes occur throughout all levels of systems design. The first chapter deals with understanding these relationships and recognizing failure modes associated with synchronization and arbitration. The second and third chapters review the building blocks and describe the Sun Cluster 3.0 software environment in detail. The remaining chapters discuss management servers and provide hypothetical case studies in which enterprise solutions are designed using Sun technologies. Appendices provide a checklist for designing clustered solutions, additional information on Sun technologies used in many different types of clusters, guidelines for data center design best practices, and a brief description of some failure analysis tools used by Sun systems designers and architects.
1. Cluster and Complex System Design Issues.
Business Reasons for Clustered Systems. Failures in Complex Systems. Data Synchronization. Arbitration Schemes. Data Caches. Timeouts. Failures in Clustered Systems. Summary.
Data Repositories and Infrastructure Services. Business Logic and Application Service. User Access Services: Web Farms. Compute Clusters. Technologies for Building Distributed Applications.
System Architecture. Kernel Infrastructure. System Features. Cluster Failures. Synchronization. Arbitration.
Design Goals. Services. Console Services. Sun Ray Server. Sun StorEdge SAN Surfer. Sun Explorer Data Collector. Sun Remote Service. Software Stack. Hardware Components. Network Configuration. Systems Management. Backup, Restore, and Recovery. Summary.
Firm Description. Design Goals. Cluster Software. Recommended Hardware Configuration. Summary.
Company Description. Information Technology Organization. Design Goals. Business Case. Requirements. Design Priorities. Cluster Software. Recommended Hardware Configuration. Summary.
Business Case Considerations. Personnel Considerations. Top-Level Design Documentation. Environmental Design. Server Design. Shared Storage Design. Network Design. Software Environment Design. Security Considerations. Systems Management Requirements. Testing Requirements.
SPARCcluster PDB 1.x and SPARCcluster HA 1.x History. Sun Cluster 2.x. Sun Cluster 2.2 and 3.0 Feature Comparison.
Hardware Platform Stability. Server Consolidation in a Common Rack. System Component Identification. AC/DC Power. System Cooling. Network Infrastructure. Security. System Installation and Configuration Documentation. Change Control Practices. Maintenance and Patch Strategy. Component Spares. New Release Upgrade Process. Support Agreement and Associated Response Time. Backup-and-Restore Testing. Cluster Recovery Procedures. Summary.
Fault Tree Analysis, Reliability Block Diagram Analysis. Failure Modes and Effects Analysis. Event Tree Analysis.
Designing Enterprise Solutions with Sun Cluster 3.0 is published under the auspices of the Sun BluePrints program. This book is written for systems architects and engineers who design clustered systems. It describes the fundamental systems engineering concepts behind clustered computer systems and discusses solutions and trade-offs in some detail.
Systems engineering is concerned with the creation of the entire answer to some real-life problem, with the answer based on science and technology Ramo 65. Systems engineers deal with the people/process/technology balance and multivariate problems. They integrate huge numbers of components, unwanted modes, partial requirements, indefinite answers, probabilities of external conditions, the testing of complicated systems, and all of the natural sciences behind the technology. This book contains little detail on specific engineering solutions; instead, it focuses on the fundamental concepts that are used repeatedly in the design of clustered computer systems.
This book provides detailed examples of the effective use of clustered system technology, along with information about the features and capabilities of the Sun Cluster 3.0 system (hereafter referred to as Sun Cluster 3.0).
Three concepts are addressed throughout the bookfailures, synchronization, and arbitration. These three concepts are examined repeatedly at all levels of the systems design.
First, complex systems tend to fail in complex ways. Implementing clustered systems can prevent some of these failures. Businesses implement clusters when the cost of implementing and maintaining a cluster is less than the cost of a service outage. While anticipating the many ways in which services hosted on clusters can fail, you must be diligent when designing clustered systems to meet business needs.
Second, clustered systems use redundancy to ensure that no single point of failure renders the data inaccessible. However, adding redundancy to a system inherently creates a synchronization problemthe multiple copies of the data must remain synchronized, or chaos ensues.
Third, redundancy and failures create arbitration problems. Given two copies of data that are potentially out of sync, which is the correct copy? Similarly, any data service operating on the data must do so with the expectation that no other data service is operating on the same data without its knowledge. These arbitration problems are solved with services supplied by the cluster infrastructure.
The mission of the Sun BluePrints program is to empower Sun's customers with the technical knowledge required to implement reliable, extensible, and secure information systems within the data center using Sun products. This program provides a framework to identify, develop, and distribute best practices information that applies across the Sun product lines. Experts in technical subjects in various areas contribute to the program and focus on the scope and advantages of the information.
The Sun BluePrints program includes books, guides, and online articles. Through these vehicles, Sun can provide guidance, installation and implementation experiences, real-life scenarios, and late-breaking technical information.
The monthly electronic magazine, Sun BluePrints OnLine, is located on the web at
http://www.sun.com/blueprints. To be notified about updates to the Sun BluePrints Program, please register on this site.
This book is primarily intended for readers with varying degrees of experience with or knowledge of clustered system technology. Detailed examples of using this technology effectively are provided in combination with the features and capabilities of the Sun Cluster 3.0 software.
You should be familiar with the basic system architecture and design principles, as well as the administration and maintenance functions of the Solaris operating environment. You should also have an understanding of standard network protocols and topologies.
This book has six chapters and four appendixes:
Chapter 1 introduces the problems that clustered systems try to solve. Emphasis is placed on failures, synchronization, and arbitration. Complex systems tend to fail in complex ways, so thinking about the impact of failures on systems should be foremost in the mind of the systems engineer. Synchronization is key to making two or more things look like one thing, which is very important for redundant systems. Arbitration is the decision-making processwhat the system does when an event occurs or does not occur.
Chapter 2 reviews infrastructure business component building blocksfile, database, and name services, application services, and web servicesand examines how clustering technology can make them highly available and scalable.
Chapter 3 describes the Sun Cluster 3.0 product architecture. This is the cornerstone of building continuously available services using Sun products. Sun Cluster 3.0 software includes many advanced features that enable the systems architect to design from the services perspective, rather than the software perspective.
Chapter 4 covers a Sun Cluster 3.0 management server example. This chapter describes the basic infrastructure services and a management server that provides these services first. This management server is used in the clustered systems solutions described in subsequent chapters.
Chapters 5 and 6 contain two hypothetical case studiesa low-cost file service and online database services. Each case study describes the business case and defines the requirements of the customer. These solutions are used to derive the design priorities that provide direction to the systems architect when design trade-offs must be made. Next, these chapters describe the system design, discussing the systems design methodology, and exploring in detail some of the design trade-offs that face the systems architect.
Appendix A contains a series of design checklists for the new Sun Cluster 3.0 product.
Appendix B provides an insight into the genesis of the new Sun Cluster 3.0 product and contrasts the features of Sun Cluster 2.2 with those of Sun Cluster 3.0.
Appendix C contains guidelines for data center design that supports highly available services.
Appendix D is a brief survey of tools that systems architects and engineers find useful when designing or analyzing highly available systems.
The SunDocs program provides more than 250 manuals from Sun Microsystems, Inc. If you live in the United States, Canada, Europe, or Japan, you can purchase documentation sets or individual manuals through this program.
docs.sun.com web site enables you to access Sun technical documentation online. You can browse the
docs.sun.com archive or search for a specific book title or subject. The URL is
The following table lists books that provide additional useful information.
|Title||Author and Publisher||ISBN Number/Part Number/URL|
|Sun Cluster Environment|
Sun Cluster 2.2
|Enrique Vargas, Joseph Bianco, and David Deeths |
Sun Microsystems Press/Prentice Hall, Inc. (2001)
|Backup and Restore Practices for Sun Enterprise Servers||Stan Stringfellow, Miroslav Klivansky, and Michael Barto |
Sun Microsystems Press/Prentice Hall, Inc. (2000)
|System Interface Guide||Sun Microsystems||806-4750-10|
|Multithreaded Programming Guide||Sun Microsystems||806-5257-10|
|Sun Cluster 3.0 7/01 Collection||Sun Microsystems|
|Building a JumpStart Infrastructure||Alex Noordergraaf |
|Cluster Platform 220/1000 ArchitectureA Product from the SunTone Platforms Portfolio||Enrique Vargas |
|Sun BluePrints OnLine article |