Also available in other formats.
Register your product to gain access to bonus material or receive a coupon.
This is the Rough Cut version of the printed book.
Stop searching the web for out-of-date, fragmentary, and unreliable information about running Hadoop! Now, there's a single source for all the authoritative knowledge and trustworthy procedures you need: Expert Hadoop® Administration: Managing Spark, YARN, and HDFS.
Pioneering Hadoop/Big Data administrator Sam R. Alapati shares step-by-step procedures for confidently performing every important task involved in creating, configuring, securing, managing, and optimizing production Hadoop clusters. The only Hadoop administration guide written by a working Hadoop administrator, Expert Hadoop® Administration covers an unmatched range of topics and offers an unparalleled collection of realistic examples. Alapati shares proven answers to complex configuration, management, and performance-tuning problems Hadoop administrators constantly encounter, and expert guidance for customizing Hadoop 2's intensely complex environment. Throughout, he integrates action-oriented advice with carefully researched explanations of both problems and solutions. Coverage includes
Part I: Introduction to Hadoop 2—Architecture and Hadoop Clusters
Chapter 1: Introduction to Hadoop 2 and Its Environment
Chapter 2: An Introduction to the Architecture of Hadoop 2
Chapter 3: Creating and Configuring a Simple Hadoop 2 Cluster
Chapter 4: Planning for and Creating a Fully Distributed Cluster
Part II: Hadoop Application Frameworks
Chapter 5: Running Applications in a Cluster—The MapReduce Framework (and Pig, Hive)
Chapter 6: Running Applications in a Cluster—The Spark Framework
Chapter 7: Running Applications in a Cluster—The Spark Framework (Second Part)
Part III: Managing and Protecting Hadoop Data and High Availability
Chapter 8: The Role of the NameNode and How HDFS Works
Chapter 9: HDFS Commands, File Permissions, and HDFS Storage Management
Chapter 10: Data Protection, Compression, and Hadoop Data Formats
Chapter 11: NameNode Operations and High Availability
Part IV: Moving Data, Allocating Resources, Scheduling Jobs, and Security
Chapter 12: Moving Data Into and Out of Hadoop
Chapter 13: YARN, and Resource Allocation in a Hadoop Cluster
Chapter 14: Working with Oozie and Hue to Manage Workflows
Chapter 15: Securing Hadoop
Part V: Monitoring, Optimization, and Troubleshooting
Chapter 16: Managing Jobs, Using Hue, and Performing Routine Tasks
Chapter 17: Monitoring, Metrics, and Hadoop Logging
Chapter 18: Bechmarking, Optimization, and Performance Tuning
Chapter 19: Configuring and Tuning Apache Spark on YARN
Chapter 20: Optimizing Spark Applications
Chapter 21: Troubleshooting Hadoop—A Sampler