Introduction to the Art of Scalability
Thanks for picking up the second edition of The Art of Scalability: Scalable Web Architecture, Processes, and Organizations for the Modern Enterprise. This book has been recognized by academics and professionals as one of the best resources available to learn the art of scaling systems and organizations. This second edition includes new content, revisions, and updates. As consultants and advisors to hundreds of hyper-growth companies, we have been fortunate enough to be on the forefront of many industry changes, including new technologies and new approaches to implementing products. While we hope our clients see value in our knowledge and experience, we are not ignorant of the fact that a large part of the value we bring to bear on a subject comes from our interactions with so many other technology companies. In this edition, we share even more of these lessons learned from our consulting practice.
In this second edition, we have added several key topics that we believe are critical to address in a book on scalability. One of the most important new topics focuses on a new organizational structure that we refer to as the Agile Organization. Other notable topics include the changing rationale for moving from data centers to clouds (IaaS/PaaS), why NoSQL solutions aren’t in and of themselves a panacea for scaling, and the importance of business metrics to the health of the overall system.
In the first edition of The Art of Scalability, we used a fictional company called AllScale to demonstrate many of the concepts. This fictional company was an aggregation of many of our clients and the challenges they faced in the real world. While AllScale provided value in highlighting the key points in the first edition, we believe that real stories make more of an impact with readers. As such, we’ve replaced AllScale with real-world stories of successes and failures in the current edition.
The information contained in this book has been carefully designed to be appropriate for any employee, manager, or executive of an organization or company that provides technology solutions. For the nontechnical executive or product manager, this book can help you prevent scalability disasters by arming you with the tools needed to ask the right questions and focus on the right areas. For technologists and engineers, this book provides models and approaches that, once employed, will help you scale your products, processes, and organizations.
Our experience with scalability goes beyond academic study and research. Although we are both formally trained as engineers, we don’t believe academic programs teach scalability very well. Rather, we have learned about scalability by suffering through the challenges of scaling systems for a combined 30-plus years. We have been engineers, managers, executives, and advisors for startups as well as Fortune 500 companies. The list of companies that our firm or we as individuals have worked with includes such familiar names as General Electric, Motorola, Gateway, eBay, Intuit, Salesforce, Apple, Dell, Walmart, Visa, ServiceNow, DreamWorks -Animation, LinkedIn, Carbonite, Shutterfly, and PayPal. The list also includes hundreds of less famous startups that need to be able to scale as they grow. Having learned the scalability lessons through thousands of hours spent diagnosing problems and thousands more hours spent designing preventions for those problems, we want to share our combined knowledge. This motivation was the driving force behind our decisions to start our consulting practice, AKF Partners, in 2007, and to write the first edition of this book, and it remains our preeminent goal in this second edition.
Scalability: So Much More Than Just Technology
Pilots are taught, and statistics show, that many aircraft incidents are the result of multiple failures that snowball into total system failure and catastrophe. In aviation, these multiple failures, which are called an error chain, often start with human rather than mechanical failure. In fact, Boeing identified that 55% of all aircraft incidents involving Boeing aircraft between 1995 and 2005 had human factorsrelated causes.1
Our experience with scalability-related issues follows a similar trend. The chief technology officer (CTO) or executive responsible for scale of a technology platform may see scalability as purely a technical endeavor. This perception is the first, and very human, failure in the error chain. Because the CTO is overly technology focused, she fails to define the processes necessary to identify scalability bottlenecks—failure number two. Because no one is identifying bottlenecks or chokepoints in the architecture, the user count or transaction volume exceeds a certain threshold and the entire product fails—failure number three. The team assembles to solve the problem, but because it has never invested in processes to troubleshoot incidents and their related problems, the team misdiagnoses the failure as the database needs to be tuned—failure number four. The vicious cycle goes on for days, with people focusing on different pieces of the technology stack and blaming everything from firewalls, to applications, to the persistence tiers to which the apps speak. Team interactions devolve into shouting matches and finger-pointing sessions, while services remain slow and unresponsive. Customers walk away, team morale flat-lines, and shareholders are left holding the bag.
The key point here is that crises resulting from an inability to scale to end-user demands are almost never technology problems alone. In our experience as former executives and advisors to our clients, scalability issues start with organizations and people, and only then spread to process and technology. People, being human, make ill-informed or poor choices regarding technical implementations, which in turn sometimes manifest themselves as a failure of a technology platform to scale. People also ignore the development of processes that might help them learn from past mistakes and sometimes put overly burdensome processes in place, which in turn might force the organization to make poor decisions or make decisions too late to be effective. A lack of attention to the people and processes that create and support technical decision making can lead to a vicious cycle of bad technical decisions, as depicted in the left side of Figure I.1. This book is the first of its kind focused on creating a virtuous cycle of people and process scalability to support better, faster, and more scalable technology decisions, as depicted in the right side of Figure I.1.
Figure I.1 Vicious and Virtuous Technology Cycles Utility