Home > Articles > Programming > Java

Java Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Interview: GigaSpaces

Last updated Mar 14, 2003.

You may have heard the buzz about the ability to "googlize" your enterprise applications, and GigaSpaces is a company that can help you do it. A few weeks ago at JavaOne I had the opportunity to meet with Nati Shalom, the founder and CTO of GigaSpaces, and ask him how his company is able to move beyond the buzz and realize this vision.

Googlizing an application refers to the near linear scalability that Google was able to accomplish by using a large farm of lesser powered machines rather than a single behemoth machine. The concept includes spreading mass amounts of data across these machines and, in the case of Google's search engine, quickly and seamlessly finding that data wherever it happens to reside; in general terms the processing is not limited searching, but whatever your business functionality entails. At the heart of googlizing is scalability. Performance and scalability are sometimes used interchangeably, but there is a distinct difference:

  • Performance is a measure of the capabilities of your application
  • Scalability is a measure of the capacity of your application

Or otherwise stated, performance measures how quickly you can satisfy an individual request while scalability measures how well you can sustain performance as the number of users and the amount of data increases. And scalability is what googlizing addresses and what GigaSpaces has been able to demonstrate. Nati informed me that in a test lab they were able to install GigaSpaces on 500 machines that serviced over two terabytes of data.

The problem that we face today is different from what we faced less than a decade ago. In the late 90's as Internet adoption evolved, companies faced an increase in user load and designed strategies to meet that user load. But as that user load continued to grow and applications were required to solve more complicated business problems, the quantity of data that those application managed became the new bottleneck. Consider that more machines, more hard drive space, and faster CPUs can process more user requests, but as the amount of data increases substantially, scaling becomes data I/O bound. Regardless of how beefy your database hardware is, your network can only send data from that hardware so fast and that represents a real bottleneck.

In order to address the problem of scalability with respect to very data intensive application, Nati reports that there are three essential components to his strategy:

  1. Partition the data
  2. Push the data closer to the application
  3. Parallelize transaction requests

If an application is data I/O bound then the first step is to partition that data and spread it across multiple machines (do you think that Google maintains its search results for the entire Internet on a single machine?) In this scenario, each machine maintains a segment of data and the software infrastructure (e.g. GigaSpaces) knows where that data is located.

Locating all of your data across your data would be inefficient, so the next feature that GigaSpaces added is a local in-JVM cache that runs close to your application. In addition to enhancing the efficiency of the application, strongly controlling the location of data provides additional redundancy which equates to reliability. If a single machine or a group of machines crash, you don't want to lose any data. Maintaining multiple copies of data ensures that as machines come and go, the data can be preserved and spread across the currently available machines.

Finally, through clustered proxies, requests are routed to where the data is located. The creates a true grid environment where machines in the grid not only maintain data, but can also process business logic against that data. This is one of the smartest innovations that Google adopted in the creation of their huge clusters of data partitions.

With that foundation established, the next challenge you face is how to build a system that provides this type of service grid without having to rearchitect your entire application. It is a highly non-trivial problem because if you truly need to scale in such a manner, ensure the integrity of your data, and maintain performance, you need to build all three of the aforementioned facets into your application.

Different strategies can be employed, and Nati embraced a technology that you might have heard of: JavaSpaces. In actuality, JavaSpaces is not new, the core technology is based upon the Linda programming language and research project at Yale University over 20 years ago. But JavaSpaces is a core part of Sun's Jini project, and Bill Joy, a co-founder of Sun, refers to JavaSpaces as "a wonderfully simple platform for developing distributed applications that takes advantage of the power of the Java programming language." GigaSpaces provided the first implementation of JavaSpaces and is an active participant in the Jini Community.

As Nati was describing this technology to me, the question that stayed in my mind was: how difficult is it for me to integrate this software infrastructure into my applications? My concern was that while I have a strong background in enterprise architecture, would I need to rearchitect all of my work to make use of these features.

His approach was to provide access into the GigaSpaces grid through most public APIs, including JDBC, JMS, Collections classes, Hibernate, and so forth. If you have been programming against interfaces as I have recommended for years, then your job is easy. Simply replace code like the following:

Map m = new HashMap();

With:

Map m = new GigaSpacesMap();

And there you go. All data put into that map is now part of the data grid. Regardless of how you get data into the grid, you can obtain it out of the grid in the most appropriate form. For example, you might add data through a Map interface, but you can extract data out through a SQL call. It is an interesting strategy and very noninvasive to your application. But a key thing that Nati mentioned regarding the profile of these applications: they have been architected from the beginning to be distributed; he is not attempted to scale an application that was never designed to be distributed. This is an important distinction because GigaSpaces googlizes applications that are meant to be googlized: trying to do otherwise can lead to unexpected and undesirable results.

GigaSpace's customer portfolio includes several large vendors in the financial as well as telecom industries, and boasts a major stock exchange. The profile for a typical customer can be characterized by anyone needing to manage large volumes of data. This includes large companies today and more companies as the volume of users and data continues to increase in the coming years.

About Nati Shalom

Nati comes from a CORBA background working with IONA on two major projects: a business-to-business application for the Israeli Yellow Pages, and the construction of a distributed call center. As a consultant he found himself having to choose between two different architectural models:

  • Messaging, in which he had to build his data models around the messaging infrastructure
  • Data, in which he had to build an event model around the database

Because these models did not solve the domain of problems he was looking at, he saw the tenets of GigaSpaces as the next wave of applications. Therefore he followed IONA's and BEA's standards-based model, but because of its simplicity, he fell in love with JavaSpaces. It represented a new way of thinking that broke the limitations of the current models and created an opportunity for him to create GigaSpaces. His foresight served him well as he solves complex problems for his customers today and because of the increasing trends in the industry towards increased data volume, more companies will encounter the same problems and need his solution in the future.

What does the future hold?

Before I let Nati go, as a visionary, I asked him to tell me where he sees the direction of enterprise applications moving in the next three to five years. He identified two key trends that he sees in the next few years:

  • Changes to the architecture of enterprise applications
  • Utility Model

He believes that the industry will be moving to the Google paradigm for building enterprise applications. The current model cannot scale when data volume is substantially increased so the industry will need to adopt a new paradigm and he believes that to be the "Google" way.

Secondly, he sees the programming model becoming simpler while more intelligence will be added to the middleware. This will lead to a utility model in which companies can lease services and integrate them into their solutions. He believes that in this model, the software sale process may follow a similar pattern to how SalesForce.com revolutionized the CRM model: rather than host a traditional proof-of-concept installation, you are simply provided access to the software for evaluation. If you like the software, then you can lease it.

And he sees GigaSpaces as being in a prime location to realize these visions. His hope for his company: "think about scalability, think of gigaspaces, the platform for googlizing the enterprise applications; Scalability == Gigaspaces."

Closing thoughts

I send my thanks out to Nati for taking the time to meet with me at JavaOne. He has opened my eyes to another way of thinking about enterprise applications and I believe that he is correct: I/O technologies cannot keep up with growth rate of data volume, so we need another approach. And on a personal note, now I am driven to learn more about Jini and JavaSpaces.