Home > Store

Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale, Rough Cuts

Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale, Rough Cuts

Rough Cuts

  • Available to Safari Subscribers
  • About Rough Cuts
  • Rough Cuts are manuscripts that are developed but not yet published, available through Safari. Rough Cuts provide you access to the very latest information on a given topic and offer you the opportunity to interact with the author to influence the final publication.

Not for Sale

Also available in other formats.


  • Copyright 2017
  • Dimensions: 7" x 9-1/8"
  • Pages: 256
  • Edition: 1st
  • Rough Cuts
  • ISBN-10: 0-13-402977-1
  • ISBN-13: 978-0-13-402977-1

This is the Rough Cut version of the printed book.

As adoption of Hadoop accelerates in the enterprise and beyond, there's soaring demand for those who can solve real world problems by applying advanced data science techniques in Hadoop environments. Now Practical Data Science with Hadoop(R) and Spark provides a complete and up-to-date guide to data science with Hadoop: high-level concepts, deep-dive techniques, practical applications, hands-on tutorials, and real-world use cases. Drawing on their immense experience with Hadoop in enterprise Big Data environments, this book's authors bring together all the practical knowledge you'll need to do real, useful data science with Hadoop. Coverage includes

  • What data science is, what data scientists do, and how to build or join a data science team
  • Core data science applications in retail, healthcare, insurance, banking, education, and beyond
  • How Hadoop has evolved into an outstanding environment for doing data science
  • A day in the life of a data scientist: exploration, iteration, and more
  • Getting your data into Hadoop: data lakes, Sqoop, Flume, Falcon, and more
  • Preparing your data, from start to finish
  • Data modeling and machine learning
  • Visualization: how (and how not) to use it
  • Start-to-finish case studies: recommender systems, customer segmentation, sentiment analysis, and predictive risk modeling
  • The future: Storm online scoring, GIRAPH graph algorithms, Solr/Elastic search, and more

Sample Content

Table of Contents




About the Authors

Part I: Data Science with Hadoop—An Overview

Chapter 1: Introduction to Data Science

Chapter 2: Use Cases for Data Science

Chapter 3: Hadoop and Data Science

Part II: Preparing and Visualizing Data with Hadoop

Chapter 4: Getting the Data into Hadoop

Chapter 5: Data Munging with Hadoop

Chapter 6: Exploring and Visualizing Data

Part III: Applying Data Modeling with Hadoop

Chapter 7: Machine Learning with Hadoop

Chapter 8: Predictive Modeling

Chapter 9: Clustering

Chapter 10: Anomaly Detection with Hadoop

Chapter 11: Natural Language Processing

Chapter 12: Data Science—The Next Frontier

Appendix A: Book Webpage and Code Download

Appendix B: HDFS Quick Start

Appendix C: Additional Background on Data Science and Apache Hadoop



Submit Errata

More Information