The Rise of “Big Data”
Excitement about analytics has been augmented by even more excitement about big data. The concept refers to data that is either too voluminous or too unstructured to be managed and analyzed through traditional means. The definition is clearly a relative one that will change over time. Currently, “too voluminous” typically means databases or data flows in petabytes (1,000 terabytes); Google, for example, processes about 24 petabytes of data per day. “Too unstructured” generally means that the data isn’t easily put into the traditional rows and columns of conventional databases.
Examples of big data include a massive amount of online information, including clickstream data from the Web and social media content (tweets, blogs, wall postings). Big data also incorporates video data from retail and crime/intelligence environments, or rendering of video entertainment. It includes voice data from call centers and intelligence interventions. In the life sciences, it includes genomic and proteomic data from biological research and medicine.
Many IT vendors and solutions providers, and some of their customers, treat the term as just another buzzword for analytics, or for managing and analyzing data to better understand the business. But there is more than vendor hype; there are considerable business benefits from being able to analyze big data on a consistent basis.
Companies that excel at big data will be able to use other new technologies, such as ubiquitous sensors and the “Internet of things.” Virtually every mechanical or electronic device can leave a trail that describes its performance, location, or state. These devices, and the people who use them, communicate through the Internet—which leads to another vast data source. When all these bits are combined with those from other media—wireless and wired telephony, cable, satellite, and so forth—the future of data appears even bigger.
Companies that employ these tools will ultimately be able to understand their business environment at the most granular level and adapt to it rapidly. They’ll be able to differentiate commodity products and services by monitoring and analyzing usage patterns. And in the life sciences, of course, effective use of big data can yield cures to the most threatening diseases.
Big data and analytics based on it promise to change virtually every industry and business function over the next decade. Organizations that get started early with big data can gain a significant competitive edge. Just as early analytical competitors in the “small data” era (including Capital One bank, Progressive insurance, and Marriott hotels) moved out ahead of their competitors and built a sizable competitive edge, the time is now for firms to seize the big-data opportunity.
The availability of all this data means that virtually every business or organizational activity can be viewed as a big-data problem or initiative. Manufacturing, in which most machines already have one or more microprocessors, is already a big-data situation. Consumer marketing, with myriad customer touchpoints and clickstreams, is already a big-data problem. Governments have begun to recognize that they sit on enormous collections of data that wait to be analyzed. Google has even described the self-driving car as a big data problem.
This book is based primarily on small-data analytics, but occasionally it refers to big data, data scientists, and other issues related to the topic. Certainly many of the ideas from traditional analytics are highly relevant to big-data analytics as well.