Home > Store

Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale

Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale

eBook (Watermarked)

  • Your Price: $28.79
  • List Price: $35.99
  • Estimated Release: Dec 16, 2016
  • Includes EPUB, MOBI, and PDF
  • About eBook Formats
  • This eBook includes the following formats, accessible from your Account page after purchase:

    ePub EPUB The open industry format known for its reflowable content and usability on supported mobile devices.

    MOBI MOBI The eBook format compatible with the Amazon Kindle and Amazon Kindle applications.

    Adobe Reader PDF The popular standard, used most often with the free Adobe® Reader® software.

    This eBook requires no passwords or activation to read. We customize your eBook by discreetly watermarking it with your name, making it uniquely yours.

Also available in other formats.

Register your product to gain access to bonus material or receive a coupon.


  • Copyright 2017
  • Dimensions: 7" x 9-1/8"
  • Pages: 288
  • Edition: 1st
  • eBook (Watermarked)
  • ISBN-10: 0-13-402974-7
  • ISBN-13: 978-0-13-402974-0

As adoption of Hadoop accelerates in the enterprise and beyond, there's soaring demand for those who can solve real world problems by applying advanced data science techniques in Hadoop environments. Now Practical Data Science with Hadoop(R) and Spark provides a complete and up-to-date guide to data science with Hadoop: high-level concepts, deep-dive techniques, practical applications, hands-on tutorials, and real-world use cases. Drawing on their immense experience with Hadoop in enterprise Big Data environments, this book's authors bring together all the practical knowledge you'll need to do real, useful data science with Hadoop. Coverage includes

  • What data science is, what data scientists do, and how to build or join a data science team
  • Core data science applications in retail, healthcare, insurance, banking, education, and beyond
  • How Hadoop has evolved into an outstanding environment for doing data science
  • A day in the life of a data scientist: exploration, iteration, and more
  • Getting your data into Hadoop: data lakes, Sqoop, Flume, Falcon, and more
  • Preparing your data, from start to finish
  • Data modeling and machine learning
  • Visualization: how (and how not) to use it
  • Start-to-finish case studies: recommender systems, customer segmentation, sentiment analysis, and predictive risk modeling
  • The future: Storm online scoring, GIRAPH graph algorithms, Solr/Elastic search, and more

Sample Content

Table of Contents




About the Authors

Part I: Data Science with Hadoop—An Overview

Chapter 1: Introduction to Data Science

Chapter 2: Use Cases for Data Science

Chapter 3: Hadoop and Data Science

Part II: Preparing and Visualizing Data with Hadoop

Chapter 4: Getting the Data into Hadoop

Chapter 5: Data Munging with Hadoop

Chapter 6: Exploring and Visualizing Data

Part III: Applying Data Modeling with Hadoop

Chapter 7: Machine Learning with Hadoop

Chapter 8: Predictive Modeling

Chapter 9: Clustering

Chapter 10: Anomaly Detection with Hadoop

Chapter 11: Natural Language Processing

Chapter 12: Data Science—The Next Frontier

Appendix A: Book Webpage and Code Download

Appendix B: HDFS Quick Start

Appendix C: Additional Background on Data Science and Apache Hadoop



Submit Errata

More Information

Unlimited one-month access with your purchase
Free Safari Membership