- Overview
-
Table of Contents
- J2SE: Standard Java
- Java Windows NT Services
- Apache Velocity
- Advanced J2SE
- Bytecode Instrumentation
- Dynamic Languages and the JVM
- J2SE 1.5.0: "Tiger"
- Java SE 6
- Java 7
- Core Computer Science Principles in Java (Data Structures)
- Annotations
- Java Generics
- Java New I/O
- Java Sound
- Java Applets
- JavaFX
- Java SE Threading
- Resource Management Using Semaphores
- Java Atomic Operations
- JavaTemplate Pages
- Executing Templates with the JtpExecutor
- Java Cryptography Extensions (JCE)
- Java Database Connectivity (JDBC) API
- Jakarta Commons - Net Class Library
- Jakarta Commons HttpClient
- Apache POI
- Regular Expressions
- JavaMail
- Cool Tools
- Building an Really Simple Syndication (RSS) Java App
- Embedding JavaScript in Java with Rhino
- Logging with Log4J
- Inside Swing
- Swing Components
- SwingX
- Swing Styled Documents
- Web Rendering in Java Swing Applications
- Java Look-and-Feel Graphics Repository
- Java Media Framework
- Quicktime for Java
- Media in Java Review 2008
- External Multimedia in Java
- Graphs and Charts
- Holiday Special: Electronic Greeting Card
- Media Framework: Presenter Application
- Standard Widget Toolkit
- JFace
- Java Performance Tuning
- J2EE Performance Tuning
- Caches and Pools
- Java Caching System
- EHCache
- Java Compression and Decompression
- Obfuscating Java Applications
- Continuous Integration
- Load Testing
- Tomcat Clustering
- High Scalability with Terracotta
- Troubleshooting Production Performance Issues
- Enterprise Java Testing
- Automated Unit Testing with JUnit and Ant
- Unit Testing: Tips From The Trenches
- Custom Ant Tasks
- Extensible Markup Language (XML)
- Java Web Technologies
- Web Frameworks
- Struts 2
- Wicket
- JavaServer Faces
- Distributed Programming / RMI
- Behavior Tracking Servlet Filter
- Servlet Filters
- Building a Robust Java Server
- J2EE: Enterprise Java
- Spring
- Spring 3
- Java Design Patterns
- Model-Driven Architecture
- Enterprise Messaging with ActiveMQ
- Event-Driven Architecture
- XDoclet
- Hibernate
- Developing Standalone Database Applications with Hypersonic DB
- Project Backup
- J2EE Project: Hands-On
- Enterprise Java Beans (EJB) 3.0
- Disaster Recovery
- Java Management Extensions (JMX)
- Service-Oriented Architecture
- Web Services
- RESTful Web Services
- Web Services with Apache CXF
- Atom Syndication
- Project: Building a Web Photo Gallery
- J2ME: Micro Java
- Specialized J2ME
- Optional Packages
- Other Java Technologies
- Derivatives and Competitors
- Java, Engineered for Integration
-
Additional Resources
- The World of Java Tools
- Building Java Applications with Ant
- Managing Java Build Lifecycles with Maven
- Acceptance Testing with FitNesse
- Source Control with Subversion
- Inversion of Control and Dependency Injection
- Certification
- Roadmap: Becoming an Enterprise Java Developer
- Roadmap: Becoming an Enterprise Java Developer in 2007
- The Business of Enterprise Software
- JavaOne 2006
- JavaOne 2007
- JavaOne 2008 Wrap-Up
- JavaOne 2009 Wrap-Up
- JavaOne 2010
- JavaOne 2011
- How to Survive in a Turbulent Job Market
- How to Hire the Best Talent
- Unified Modeling Language (UML)
- Cloud Computing
- Amazon EC2 and Java
- MongoDB
- Enterprise Java in 2008 and Beyond
- Predictions for 2018
Interview: GigaSpaces
Last updated Mar 14, 2003.
You may have heard the buzz about the ability to "googlize" your enterprise applications, and GigaSpaces is a company that can help you do it. A few weeks ago at JavaOne I had the opportunity to meet with Nati Shalom, the founder and CTO of GigaSpaces, and ask him how his company is able to move beyond the buzz and realize this vision.
Googlizing an application refers to the near linear scalability that Google was able to accomplish by using a large farm of lesser powered machines rather than a single behemoth machine. The concept includes spreading mass amounts of data across these machines and, in the case of Google's search engine, quickly and seamlessly finding that data wherever it happens to reside; in general terms the processing is not limited searching, but whatever your business functionality entails. At the heart of googlizing is scalability. Performance and scalability are sometimes used interchangeably, but there is a distinct difference:
- Performance is a measure of the capabilities of your application
- Scalability is a measure of the capacity of your application
Or otherwise stated, performance measures how quickly you can satisfy an individual request while scalability measures how well you can sustain performance as the number of users and the amount of data increases. And scalability is what googlizing addresses and what GigaSpaces has been able to demonstrate. Nati informed me that in a test lab they were able to install GigaSpaces on 500 machines that serviced over two terabytes of data.
The problem that we face today is different from what we faced less than a decade ago. In the late 90's as Internet adoption evolved, companies faced an increase in user load and designed strategies to meet that user load. But as that user load continued to grow and applications were required to solve more complicated business problems, the quantity of data that those application managed became the new bottleneck. Consider that more machines, more hard drive space, and faster CPUs can process more user requests, but as the amount of data increases substantially, scaling becomes data I/O bound. Regardless of how beefy your database hardware is, your network can only send data from that hardware so fast and that represents a real bottleneck.
In order to address the problem of scalability with respect to very data intensive application, Nati reports that there are three essential components to his strategy:
- Partition the data
- Push the data closer to the application
- Parallelize transaction requests
If an application is data I/O bound then the first step is to partition that data and spread it across multiple machines (do you think that Google maintains its search results for the entire Internet on a single machine?) In this scenario, each machine maintains a segment of data and the software infrastructure (e.g. GigaSpaces) knows where that data is located.
Locating all of your data across your data would be inefficient, so the next feature that GigaSpaces added is a local in-JVM cache that runs close to your application. In addition to enhancing the efficiency of the application, strongly controlling the location of data provides additional redundancy which equates to reliability. If a single machine or a group of machines crash, you don't want to lose any data. Maintaining multiple copies of data ensures that as machines come and go, the data can be preserved and spread across the currently available machines.
Finally, through clustered proxies, requests are routed to where the data is located. The creates a true grid environment where machines in the grid not only maintain data, but can also process business logic against that data. This is one of the smartest innovations that Google adopted in the creation of their huge clusters of data partitions.
With that foundation established, the next challenge you face is how to build a system that provides this type of service grid without having to rearchitect your entire application. It is a highly non-trivial problem because if you truly need to scale in such a manner, ensure the integrity of your data, and maintain performance, you need to build all three of the aforementioned facets into your application.
Different strategies can be employed, and Nati embraced a technology that you might have heard of: JavaSpaces. In actuality, JavaSpaces is not new, the core technology is based upon the Linda programming language and research project at Yale University over 20 years ago. But JavaSpaces is a core part of Sun's Jini project, and Bill Joy, a co-founder of Sun, refers to JavaSpaces as "a wonderfully simple platform for developing distributed applications that takes advantage of the power of the Java programming language." GigaSpaces provided the first implementation of JavaSpaces and is an active participant in the Jini Community.
As Nati was describing this technology to me, the question that stayed in my mind was: how difficult is it for me to integrate this software infrastructure into my applications? My concern was that while I have a strong background in enterprise architecture, would I need to rearchitect all of my work to make use of these features.
His approach was to provide access into the GigaSpaces grid through most public APIs, including JDBC, JMS, Collections classes, Hibernate, and so forth. If you have been programming against interfaces as I have recommended for years, then your job is easy. Simply replace code like the following:
Map m = new HashMap();
With:
Map m = new GigaSpacesMap();
And there you go. All data put into that map is now part of the data grid. Regardless of how you get data into the grid, you can obtain it out of the grid in the most appropriate form. For example, you might add data through a Map interface, but you can extract data out through a SQL call. It is an interesting strategy and very noninvasive to your application. But a key thing that Nati mentioned regarding the profile of these applications: they have been architected from the beginning to be distributed; he is not attempted to scale an application that was never designed to be distributed. This is an important distinction because GigaSpaces googlizes applications that are meant to be googlized: trying to do otherwise can lead to unexpected and undesirable results.
GigaSpace's customer portfolio includes several large vendors in the financial as well as telecom industries, and boasts a major stock exchange. The profile for a typical customer can be characterized by anyone needing to manage large volumes of data. This includes large companies today and more companies as the volume of users and data continues to increase in the coming years.
About Nati Shalom
Nati comes from a CORBA background working with IONA on two major projects: a business-to-business application for the Israeli Yellow Pages, and the construction of a distributed call center. As a consultant he found himself having to choose between two different architectural models:
- Messaging, in which he had to build his data models around the messaging infrastructure
- Data, in which he had to build an event model around the database
Because these models did not solve the domain of problems he was looking at, he saw the tenets of GigaSpaces as the next wave of applications. Therefore he followed IONA's and BEA's standards-based model, but because of its simplicity, he fell in love with JavaSpaces. It represented a new way of thinking that broke the limitations of the current models and created an opportunity for him to create GigaSpaces. His foresight served him well as he solves complex problems for his customers today and because of the increasing trends in the industry towards increased data volume, more companies will encounter the same problems and need his solution in the future.
What does the future hold?
Before I let Nati go, as a visionary, I asked him to tell me where he sees the direction of enterprise applications moving in the next three to five years. He identified two key trends that he sees in the next few years:
- Changes to the architecture of enterprise applications
- Utility Model
He believes that the industry will be moving to the Google paradigm for building enterprise applications. The current model cannot scale when data volume is substantially increased so the industry will need to adopt a new paradigm and he believes that to be the "Google" way.
Secondly, he sees the programming model becoming simpler while more intelligence will be added to the middleware. This will lead to a utility model in which companies can lease services and integrate them into their solutions. He believes that in this model, the software sale process may follow a similar pattern to how SalesForce.com revolutionized the CRM model: rather than host a traditional proof-of-concept installation, you are simply provided access to the software for evaluation. If you like the software, then you can lease it.
And he sees GigaSpaces as being in a prime location to realize these visions. His hope for his company: "think about scalability, think of gigaspaces, the platform for googlizing the enterprise applications; Scalability == Gigaspaces."
Closing thoughts
I send my thanks out to Nati for taking the time to meet with me at JavaOne. He has opened my eyes to another way of thinking about enterprise applications and I believe that he is correct: I/O technologies cannot keep up with growth rate of data volume, so we need another approach. And on a personal note, now I am driven to learn more about Jini and JavaSpaces.
