Introduction to Grid Computing
In today's pervasive world of needing information anytime and anywhere, the explosive Grid Computing environments have now proven to be so significant that they are often referred to as being the world's single and most powerful computer solutions. It has been realized that with the many benefits of Grid Computing, we have consequently introduced both a complicated and complex global environment, which leverages a multitude of open standards and technologies in a wide variety of implementation schemes. As a matter of fact the complexity and dynamic nature of industrial problems in today's world are much more intensive to satisfy by the more traditional, single computational platform approaches.
Grid Computing equates to the world's largest computer ...
The Grid Computing discipline involves the actual networking services and connections of a potentially unlimited number of ubiquitous computing devices within a "grid." This new innovative approach to computing can be most simply thought of as a massively large power "utility" grid, such as what provides power to our homes and businesses each and every day. This delivery of utility-based power has become second nature to many of us, worldwide. We know that by simply walking into a room and turning on the lights, the power will be directed to the proper devices of our choice for that moment in time. In this same utility fashion, Grid Computing openly seeks and is capable of adding an infinite number of computing devices into any grid environment, adding to the computing capability and problem resolution tasks within the operational grid environment.
The incredible problem resolution capabilities of Grid Computing remain yet unknown, as we continue to forge ahead and enter this new era of massively powerful grid-based problem-solving solutions.
This "Introduction" section of the book will begin to present many of the Grid Computing topics, which are discussed throughout this book. These discussions in Chapter 1 are intended only to provide a rather high-level examination of Grid Computing. Later sections of the book provide a full treatment of the topics addressed by many worldwide communities utilizing and continuing to develop Grid Computing.
The worldwide business demand requiring intense problem-solving capabilities for incredibly complex problems has driven in all global industry segments the need for dynamic collaboration of many ubiquitous computing resources to be able to work together. These difficult computational problem-solving needs have now fostered many complexities in virtually all computing technologies, while driving up costs and operational aspects of the technology environments. However, this advanced computing collaboration capability is indeed required in almost all areas of industrial and business problem solving, ranging from scientific studies to commercial solutions to academic endeavors. It is a difficult challenge across all the technical communities to achieve this level of resource collaboration needed for solving these complex and dynamic problems, within the bounds of the necessary quality requirements of the end user.
To further illustrate this environment and oftentimes very complex set of technology challenges, let us consider some common use case scenarios one might have already encountered, which will begin to examine the many values of a Grid Computing solution environment. These simple use cases, for purposes of introduction to the concepts of Grid Computing, are as follows:
A financial organization processing wealth management application collaborates with the different departments for more computational power and software modeling applications. It pools a number of computing resources, which can thereby perform faster with real-time executions of the tasks and immediate access to complex pools of data storage, all while managing complicated data transfer tasks. This ultimately results in increased customer satisfaction with a faster turnaround time.
A group of scientists studying the atmospheric ozone layer will collect huge amounts of experimental data, each and every day. These scientists need efficient and complex data storage capabilities across wide and geographically dispersed storage facilities, and they need to access this data in an efficient manner based on the processing needs. This ultimately results in a more effective and efficient means of performing important scientific research.
Massive online multiplayer game scenarios for a wide community of international gaming participants are occurring that require a large number of gaming computer servers instead of a dedicated game server. This allows international game players to interact among themselves as a group in a real-time manner. This involves the need for on-demand allocation and provisioning of computer resources, provisioning and self-management of complex networks, and complicated data storage resources. This on-demand need is very dynamic, from moment-to-moment, and it is always based upon the workload in the system at any given moment in time. This ultimately results in larger gaming communities, requiring more complex infrastructures to sustain the traffic loads, delivering more profits to the bottom lines of gaming corporations, and higher degrees of customer satisfaction to the gaming participants.
A government organization studying a natural disaster such as a chemical spill may need to immediately collaborate with different departments in order to plan for and best manage the disaster. These organizations may need to simulate many computational models related to the spill in order to calculate the spread of the spill, effect of the weather on the spill, or to determine the impact on human health factors. This ultimately results in protection and safety matters being provided for public safety issues, wildlife management and protection issues, and ecosystem protection matters: Needles to say all of which are very key concerns.
Today, Grid Computing offers many solutions that already address and resolve the above problems. Grid Computing solutions are constructed using a variety of technologies and open standards. Grid Computing, in turn, provides highly scalable, highly secure, and extremely high-performance mechanisms for discovering and negotiating access to remote computing resources in a seamless manner. This makes it possible for the sharing of computing resources, on an unprecedented scale, among an infinite number of geographically distributed groups. This serves as a significant transformation agent for individual and corporate implementations surrounding computing practices, toward a general-purpose utility approach very similar in concept to providing electricity or water. These electrical and water types of utilities, much like Grid Computing utilities, are available "on demand," and will always be capable of providing an always-available facility negotiated for individual or corporate utilization.
In this new and intriguing book, we will begin our discussion on the core concepts of the Grid Computing system with an early definition of grid. Back in 1998, it was defined, "A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities" (Foster & Kesselman, 1998).
The preceding definition is more centered on the computational aspects of Grid Computing while later iterations broaden this definition with more focus on coordinated resource sharing and problem solving in multi-institutional virtual organizations (Foster & Kesselman, 1998). In addition to these qualifications of coordinated resource sharing and the formation of dynamic virtual organizations, open standards become a key underpinning. It is important that there are open standards throughout the grid implementation, which also accommodate a variety of other open standards-based protocols and frameworks, in order to provide interoperable and extensible infrastructure environments.
Grid Computing environments must be constructed upon the following foundations:
Coordinated resources. We should avoid building grid systems with a centralized control; instead, we must provide the necessary infrastructure for coordination among the resources, based on respective policies and service-level agreements.
Open standard protocols and frameworks. The use of open standards provides interoperability and integration facilities. These standards must be applied for resource discovery, resource access, and resource coordination.
Another basic requirement of a Grid Computing system is the ability to provide the quality of service (QoS) requirements necessary for the end-user community. These QoS validations must be a basic feature in any Grid system, and must be done in congruence with the available resource matrices. These QoS features can be (for example) response time measures, aggregated performance, security fulfillment, resource scalability, availability, autonomic features such as event correlation and configuration management, and partial fail over mechanisms.
There have been a number of activities addressing the above definitions of Grid Computing and the requirements for a grid system. The most notable effort is in the standardization of the interfaces and protocols for the Grid Computing infrastructure implementations. We will cover the details later in this book. Let us now explore some early and current Grid Computing systems and their differences in terms of benefits.
Early Grid Activities
Over the past several years, there has been a lot of interest in computational Grid Computing worldwide. We also note a number of derivatives of Grid Computing, including compute grids, data grids, science grids, access grids, knowledge grids, cluster grids, terra grids, and commodity grids. As we explore careful examination of these grids, we can see that they all share some form of resources; however, these grids may have differing architectures.
One key value of a grid, whether it is a commodity utility grid or a computational grid, is often evaluated based on its business merits and the respective user satisfaction. User satisfaction is measured based on the QoS provided by the grid, such as the availability, performance, simplicity of access, management aspects, business values, and flexibility in pricing. The business merits most often relate to and indicate the problem being solved by the grid. For instance, it can be job executions, management aspects, simulation workflows, and other key technology-based foundations.
Earlier Grid Computing efforts were aligned with the overlapping functional areas of data, computation, and their respective access mechanisms. Let us further explore the details of these areas to better understand their utilization and functional requirements.
The data aspects of any Grid Computing environment must be able to effectively manage all aspects of data, including data location, data transfer, data access, and critical aspects of security. The core functional data requirements for Grid Computing applications are:
The ability to integrate multiple distributed, heterogeneous, and independently managed data sources.
The ability to provide efficient data transfer mechanisms and to provide data where the computation will take place for better scalability and efficiency.
The ability to provide data caching and/or replication mechanisms to minimize network traffic.
The ability to provide necessary data discovery mechanisms, which allow the user to find data based on characteristics of the data.
The capability to implement data encryption and integrity checks to ensure that data is transported across the network in a secure fashion.
The ability to provide the backup/restore mechanisms and policies necessary to prevent data loss and minimize unplanned downtime across the grid.
The core functional computational requirements for grid applications are:
The ability to allow for independent management of computing resources
The ability to provide mechanisms that can intelligently and transparently select computing resources capable of running a user's job
The understanding of the current and predicted loads on grid resources, resource availability, dynamic resource configuration, and provisioning
Failure detection and failover mechanisms
Ensure appropriate security mechanisms for secure resource management, access, and integrity
Let us further explore some details on the computational and data grids as they exist today.
Computational and Data Grids
In today's complex world of high speed computing, computers have become extremely powerful as to that of (let's say) five years ago. Even the home-based PCs available on the commercial markets are powerful enough for accomplishing complex computations that we could not have imagined a decade prior to today.
The quality and quantity requirements for some business-related advanced computing applications are also becoming more and more complex. The industry is now realizing that we have a need, and are conducting numerous complex scientific experiments, advanced modeling scenarios, genome matching, astronomical research, a wide variety of simulations, complex scientific/business modeling scenarios, and real-time personal portfolio management. These requirements can actually exceed the demands and availability of installed computational power within an organization. Sometimes, we find that no single organization alone satisfies some of these aforementioned computational requirements.
This advanced computing power applications need is indeed analogous to the electric power need in the early 1900s, such that to provide for the availability of electrical power, each user has to build and be prepared to operate an electrical generator. Thus, when the electric power grid became a reality, this changed the entire concept of the providing for, and utilization of, electrical power. This, in turn, paved the way for an evolution related to the utilization of electricity. In a similar fashion, the computational grids change the perception on the utility and availability of the computer power. Thus the computational Grid Computing environment became a reality, which provides a demand-driven, reliable, powerful, and yet inexpensive computational power for its customers.
As we noted earlier in this discussion, a computational Grid Computing environment consists of one or more hardware- and software-enabled environments that provide dependable, consistent, pervasive and inexpensive access to high-end computational capabilities (Foster & Kesselman, 1998).
Later in this book, in the "Grid Anatomy" section, we will see that this definition has evolved to give more emphasis on the seamless resource sharing aspects in a collaborative virtual organizational world. But the concept still holds for a computational grid where the sharable resource remains a computing power. As of now, the majority of the computational grids are centered on major scientific experiments and collaborative environments.
The requirement for key data forms a core underpinning of any Grid Computing environment. For example, in data-intensive grids, the focus is on the management of data, which is being held in a variety of data storage facilities in geographically dispersed locations. These data sources can be databases, file systems, and storage devices. The grid systems must also be capable of providing data virtualization services to provide transparency for data access, integration, and processing. In addition to the above requirements, security and privacy requirements of all respective data in a grid system is quite complex.
We can summarize the data requirements in the early grid solutions as follows:
The ability to discover data
The access to databases, utilizing meta-data and other attributes of the data
The provisioning of computing facilities for high-speed data movement
The capability to support flexible data access and data filtering capabilities
As one begins to realize the importance of extreme high performance-related issues in a Grid Computing environment, it is recommended to store (or cache) data near to the computation, and to provide a common interface for data access and management.
It is interesting to note that upon careful examination of existing Grid Computing systems, readers will learn that many Grid Computing systems are being applied in several important scientific research and collaboration projects; however, this does not preclude the importance of Grid Computing in business-, academic-, and industry-related fields. The commercialization of Grid Computing invites and addresses a key architectural alignment with several existing commercial frameworks for improved interoperability and integration.
As we will describe in this book, many current trends in Grid Computing are toward service-based architectures for grid environments. This "architecture" is built for interoperability and is (again) based upon open standard protocols. We will provide a full treatment including many of the details toward this architecture throughout subsequent sections in this book.