Big Data Scalability: Why Your Database Is Slow, and When You Should Start Scaling
- What Is Scalability?
- Detecting Performance Bottlenecks
- When to Begin Scaling
- Okay, You Need to Scale. Now What?
The world is experiencing a true Big Data explosion—a boom in databases and database technology. As data comes in from all over—social media, search engines, advertising, and so on—our ability to convert that data into meaningful information is more important than ever. After all, what good is having a cache of data if we can’t extract meaning from it? Although we’ve developed a handful of methods to deal with Big Data, each effective in its own right or in specific applications, today we need a comprehensive method for managing the seemingly limitless flow of daily data.
When we talk about managing data, in essence we’re talking about managing a database. We can develop the most efficient method of gathering data, but once that data passes the aggregation step and enters the database tier, it introduces a whole new thought process: How do we turn a database filled with useful information into an effective tool for modeling? The answer is via scalability.
What Is Scalability?
A scalable application platform not only accommodates rapid growth in traffic and data volume (scaling up) but also adapts to decreases in demand (scaling down). Such an elastic, on-demand database would be valuable in a world where the ebb and flow of data is constant, yet unpredictable. A good example of such a situation is operating an application in a public cloud. In this scenario, managing costs is essential. When resources are not required, reducing usage is vital; otherwise, significant waste is incurred, threatening the economic viability and profitability of the application provider.
In my experience with the scalable database approach at CodeFutures, several customers became virtual overnight successes in their fields. A memorable example was a social gaming application that took off like a rocket from its very inception. The game ramped from zero to more than five million daily active users (DAUs) in just a few short months. Nothing equals the thrill of working on a project like this—it’s invigorating, and it challenges every aspect of your technical and database knowledge.
However, such applications expose every weakness in an infrastructure. Not only is scaling the database tier an absolute requirement; keeping it operational for nonstop 24x7 operations is an incredible feat. This client saw every failure scenario imaginable due to the high volumes—from the underlying public cloud infrastructure, to the caching layer, to the database tier. The database had to scale so fast that we literally expanded the cluster more than eight times in that first few months.
This is just one situational example. Innumerable other scenarios could arise requiring a similar response. The need for database scalability is clear. The question is this: How do you know when to scale your database?