Data Just Right: Practical Big Data Analytics, Rough Cuts
This is the Rough Cut version of the printed book.
Large-scale data analysis ("Big Data") is suddenly of crucial importance to virtually every enterprise. Mobile and social technologies are generating massive datasets, and distributed cloud computing is providing better ways to analyzing and processing that data. Accelerating technological change is turning long-accepted ideas about Big Data upside down, forcing companies to evaluate daunting new technologies, including NoSQL databases. Until now, however, most books on "Big Data" have been little more than business polemics and product catalogs. Data Just Right is different -- and utterly invaluable to every Big Data decision-maker, implementer, and strategist.
Google's Michael Manoochehri organizes this book around today's key Big Data use cases, showing how they can be best addressed by combining technologies in hybrid solutions. Drawing on his own extensive experience, Manoochehri presents the technical detail you need to implement each solution, and best practices you can apply to any Big Data project. You'll learn how to:
- "Build for infinity," supporting rapid growth
- Break down data silos
- Decide what to insource and what to outsource
- Focus on applications, not infrastructure, since that's where you can drive the most value
Throughout, Manoochehri shows how to use and integrate cutting-edge technologies including Hadoop, Hive, Pig, Tableau, R, and Google Bigquery. No other Big Data guide offers as much practical, actionable insight -- or even comes close.
Table of Contents
Part I: Directives in the Big Data Era
Chapter 1: The Four Guiding Principles for Data Success
Part II: Collecting and Sharing a lot of Data
Chapter 2: How to Host and Share 5 Terabytes of Data
Chapter 3: Building a NoSQL-Based Web App to Collect Crowd-Sourced Data
Chapter 4: Strategies for Breaking Down Data Silos
Part III: Asking Questions About Your Data
Chapter 5: Using Hadoop and Hive to Ask Questions about Large Datasets
Chapter 6: Building a Data Dashboard with Google Bigquery
Chapter 7: Preparing Big Data Sets for Visualization (with Tableau)
Part IV: Data Pipelines and Real Time Data
Chapter 8: Putting It Together: A Data Pipeline
Chapter 9: Building Data Transformation Workflows with Pig and Cascading
Chapter 10: Analyzing Snapshots of Streaming Data with Twitter Storm
Part V: Machine Learning for Large Datasets
Chapter 11: Building a Big Data Classification System with Mahout
Part VI: Statistical Analysis for Massive Datasets
Chapter 12: Using R with Large Datasets
Chapter 13: Building Data Analytics Applications in Python with Pandas
Part VII: Practical Solutions to Big Data Strategy
Chapter 14: When to Build, When to Buy, When to Outsource
Chapter 15: The Future
Available to Safari Books Online Subscribers
What is this?
Rough Cuts are manuscripts that are developed but not yet published, available through Safari Books Online. Rough Cuts provide you access to the very latest information on a given topic and offer you the opportunity to interact with the author to influence the final publication.